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Preface 


Over the years, the field of intelligent vehicles has become a major research theme 
in intelligent transportation systems since traffic accidents are serious and growing 
problems all over the world. The goal of an intelligent vehicle is to augment vehicle 
autonomous driving either entirely or partly for the purposes of safety, comforta- 
bility, and saving energy. Indeed, many technologies of intelligent vehicles root in 
autonomous mobile robots. The tasks of intelligent vehicles become even more chal- 
lenging compared to indoor mobile robots for two reasons. First, real-time dynamic 
complex environment perception and modeling will challenge current indoor robot 
technologies. Autonomous intelligent vehicles have to finish the basic procedures: 
perceiving and modeling environment, localizing and building maps, planning paths 
and making decisions, and controlling the vehicles within limit time for real-time 
purposes. Meanwhile, we face the challenge of processing large amounts of data 
from multi-sensors, such as cameras, lidars, radars. This is extremely hard in more 
complex outdoor environments. Toward this end, we have to implement those tasks 
in more efficient ways. Second, vehicle motion control faces the challenges of strong 
nonlinear characteristics due to high mass, especially in the processes of high speed 
and sudden steering. In this case, both lateral and longitudinal control algorithms of 
indoor robots do not work well. 

This book presents our recent research work on intelligent vehicles and is aimed 
at the researchers and graduate students interested in intelligent vehicles. Our goal 
in writing this book is threefold. First, it creates an updated reference book of in- 
telligent vehicles. Second, this book not only presents object/obstacle detection and 
recognition, but also introduces vehicle lateral and longitudinal control algorithms, 
which benefits the readers keen to learn broadly about intelligent vehicles. Finally, 
we put emphasis on high-level concepts, and at the same time provide the low-level 
details of implementation. We try to link theory, algorithms, and implementation to 
promote intelligent vehicle research. 

This book is divided into four parts. The first part Autonomous Intelligent Ve- 
hicles presents the research motivation and purposes, the state-of-art of intelligent 
vehicles research. Also, we introduce the framework of intelligent vehicles. The sec- 
ond part Environment Perception and Modeling which includes Road detection 


5rjs.cn 000000 





vi Preface 


and tracking, Vehicle detection and tracking, Multiple-sensor based multiple-object 
tracking introduces environment perception and modeling. The third part Vehicle 
Localization and Navigation which includes An integrated DGPS/IMU positioning 
approach, Vehicle navigation using global views presents vehicle navigation based 
on integrated GPS and INS. The fourth part Advanced Vehicle Motion control 
introduces vehicle lateral and longitudinal motion control. 

Most of this book refers to our research work at Xi'an Jiaotong University and 
Carnegie Mellon University. During the last ten years of research, a large number 
of people had been working in the Springrobot Project at Xi'an Jiaotong University. 
I would like to deliver my deep respect to my Ph.D advisor, Professor Nanning 
Zheng, who leaded me into this field. Also I would like to thank: Yuehu Liu, Xiaojun 
Lv, Lin Ma, Xuetao Zhang, Junjie Qin, Jingbo Tang, Yingtuan Hou, Jing Yang, 
Li Zhao, Chong Sun, Fan Mu, Ran Li, Weijie Wang, and Huub van de Wetering. 
Also, I would like to thank Jie Yang at Carnegie Mellon University who supported 
Hong Cheng's research work during his stay at this university and Zicheng Liu at 
Microsoft Research who helped Hong Cheng discuss vehicle navigation with global 
views. I also would like to our sincere and deep thanks to Zhongjun Dai who helped 
immensely with figure preparation and with the typesetting of the book in LaTeX. 
Many people have helped by proofreading draft materials and providing comments 
and suggestions, including Nana Chen, Rui Huang, Pingxin Long, Wenjun Jing, 
Yuzhuo Wang. Springer has provided excellent support throughout the final stages 
of preparation of this book, and I would like to thank our commissioning editor 
Wayne Wheeler for his support and professionalism as well as Simon Rees for his 
help. 


Chengdu, People's Republic of China Hong Cheng 
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Autonomous Intelligent Vehicles 
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Chapter 1 
Introduction 


1.1 Research Motivation and Purpose 


Autonomous intelligent vehicles are generic technology sets to augment vehicle au- 
tonomous driving entirely or in part for autonomous and safety purposes. Funda- 
mentally, autonomous intelligent vehicles refer to many mobile robot technologies. 
In principle, we consider autonomous intelligent vehicles as mobile robot platforms 
in this book. Hence, an intelligent vehicle consists of four fundamental technologies: 
environment perception and modeling, localization and map building, path planning 
and decision-making, and motion control [26], shown in Fig. 1.1. 

The dreams of a human being are the power and source of pushing the world 
forward. The National Research Council once predicted that the core weapon in 
the twentieth century would be the tank, while that in the twenty-first century—an 
unmanned battle system [1]. Moreover, a third of the U.S. military ground vehi- 
cles must be unmanned by 2015. Therefore, since 1980s, the Defense Advanced 
Research Projects Agency (DARPA) initiated a new project, namely the unmanned 
battle project. Its goal is to design a car which can autonomously implement navi- 
gation, obstacle avoidance, and path planning. Afterwards, it opened an intelligent 
vehicle era. Moreover, the U.S. Department of Energy launched a ten year robot 
and intelligent system plan (1986-1995), and also the space robot plan. In terms 
of space exploration, the National Aeronautics and Space Administration (NASA) 
has developed several wheeled rovers, such as Spirit and Opportunity, for science 
explorations. ! 

A major concern associated with the rapid growth in automotive production is 
an increase in traffic congestion and accidents [36]. To solve the problem, the gov- 
ernments all over the world have been increasing funds to improve the traffic in- 
frastructure, enforce traffic laws, and educate drivers about traffic regulations. In 
addition, research institutes have launched R&D projects in driver assistance and 
safety warning systems. Therefore, in the last decade, many research works in the 


lhttp://marsrovers.jpl.nasa.gov/overview/. 
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Fig. 1.1 The basic framework of autonomous intelligent vehicles 


area of intelligent vehicles all over the world led to Intelligent Transportation Sys- 
tems (ITS) for improving road safety and reducing traffic accidents [7]. Autonomous 
intelligent vehicles are now widely applied to Driver Assistance and Safety Warn- 
ing Systems (DASWS) [36], such as Forward Collision Warning [9, 27], Adaptive 
Cruise Control [32], Lane Departure Warning [16]. In recent years, with the devel- 
opment of economy and society, the issues of traffic safety, energy shortage, and 
environment pollution became more serious. Those problems then led to higher vol- 
umes of research and applications. Toward this end, combining vehicles, drivers 
and lanes together, we can implement better traffic capacity and traffic safety using 
computer control, artificial intelligence and communication technologies [3]. 

The most important reasons for the large numbers of traffic accidents are bur- 
densome driving and fatigue driving. When driving on the traffic congestion lanes, 
drivers have to do a lot of operations, such as shifting and pulling clutches, and they 
have to complete 20 to 30 coordination operations of hand and foot movements each 
minute. With the economic development and the increase of vehicle ownership, the 
number of non-professional drivers are rising, leading to frequent traffic accidents. 
As a result, traffic accidents have become the first public nuisance in modern soci- 
ety. Traffic problems have troubled the whole world, and then, the question of how 
to improve traffic safety has become an urgent social issue. Lane departure systems, 
fatigue detection systems, and automatic cruise control can greatly reduce driver's 
workload and improve transportation system safety. 

The widely application prospects of intelligent vehicles promote the development 
of transportation systems which attracts a growing number of research institutions 
and auto manufacturers. The DARPA had held the Grand Challenges and the Ur- 
ban Challenge since 2004. Their goal is to develop autonomous intelligent vehicles 
capable of both perceiving various environments, such as desert trails, roads, and 
urban areas, and navigating at high speeds? [5, 30, 31]. In the first Grand Chal- 


*http://www.urban-challenge.com/_eng/. 
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lenge, CMU's Sandstorm went for 7.4 miles from the start, opening the possibil- 
ities of autonomous capability [30]. In 2005, five vehicles, namely Stanley, Sand- 
storm, High lander, Kat-5, and TerraMax, were able to complete that challenge, and 
Stanley took the first place ahead of Sandstorm [31]. After the success of the two 
Grand Challenges, the DARPA organized the Urban Challenge [5]. In the Urban 
Challenge, based on the technical reports of implementing safe and capable au- 
tonomous vehicles, the DARPA allowed 53 teams to demonstrate how they navigate 
simple urban driving scenes. After these demonstrations, only 36 teams were in- 
vited to attend the National Qualification Event (NQE). Finally, only 11 teams were 
qualified for the Urban Challenge Final Event (UCFE). In China, the 2008 Bei- 
jing Olympic Games whose slogans were Hi-tech Olympics and Green Olympics 
adopted many advanced traffic management systems, intelligent vehicles, electric 
vehicles for improving vehicle safety performance, reducing pollution, easing traf- 
fic congestion. Consequently, those innovations drew attention of many researchers. 
In 2011, China released ten leading edge technologies and modern transportation 
technologies among which were technologies aiming at developing intelligent ve- 
hicles. Moreover, the National Natural Science Foundation of China launched the 
state key development plan in 2008, so that audio-visual information based cog- 
nizing computation? could integrate human-computer interfaces, computer vision, 
language understanding, and cooperative computing. Finally, upon those achieve- 
ments, the goal of this plan is to develop autonomous intelligent vehicles which are 
capable of both perceiving natural environment and making intelligent decisions. 
Meanwhile, similar to the Grand Challenge supported by DARPA, the plan holds 
the Future Challenge each year. 

The research on intelligent vehicles can greatly facilitate the rapid development 
of other disciplines, such as exploring planets. The U.S. Mars vehicles Spirit and 
Opportunity play an irreplaceable role in exploring Mars and the vast universe be- 
yond Mars [13, 23]. In China, the government released the White Paper *China 
Aerospace" in November 2000, which targets exploring the moon and other planets 
in the near future. Furthermore, space mobile robots are the key part for exploring 
planets which could benefit the utilizing solar energy. 


1.2 The Key Technologies of Intelligent Vehicles 


As we mentioned before, intelligent vehicles are a set of intelligent agents which 
integrate multi-sensor fusion based environment perception and modeling, local- 
ization and map building, path planning and decision-making, and motion control, 
shown in Fig. 1.1. The environment perception and modeling module is responsible 
for sensing environment structures in a multi-sensor way and providing a model of 
the surrounding environment. Here, the environment model includes a list of mov- 
ing objects, that of static obstacles, vehicle position relative to the current road, the 


?http;//ccvai.xjtu.edu.cn. 
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Fig. 1.2 Multi-sensor fusion based modeling and environment perception 


road shape, etc. Finally, this module provides the environment model and the local 
map to the localization and map building module by processing the original data, 
vision, lidar, and radar. The second module, vehicle localization and map building, 
is to use geometric feature location estimate in the map to determine the vehicle's 
position, and to interpret sensor information to estimate the locations of geometric 
features in a global map. As a result, the second module yields a global map based 
on the environment model and a local map. The path planning and decision-making 
module is to assist in ensuring that the vehicle is operated in accordance with the 
rules of the ground, safety, comfortability, vehicle dynamics, and environment con- 
texts. Hence, this module can potentially improve mission efficiency and generate 
the desired path. The final module, motion control, is to execute the commands nec- 
essary to achieve the planned paths, thus yielding interaction between the vehicle 
and its surrounding environment. A brief introduction of these modules is presented 
below. 


1.2.1 Multi-sensor Fusion Based Environment Perception and 
Modeling 


Figure 1.2 illustrates a general environment perception and modeling framework. 
From this framework, we can see that: (1) The original data are collected by vari- 
ous sensors; (ii) Various features are extracted from the original data, such as road 
(object) colors, lane edges, building contours; (ii) Semantic objects are recognized 
using classifiers, and consist of lanes, signs, vehicles, pedestrians; (iv) We can de- 
duce driving contexts, and vehicle positions. 
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1.2 The Key Technologies of Intelligent Vehicles T 


1. Multi-sensor fusion 

Multi-sensor fusion is the basic framework of intelligent vehicles for bet- 
ter sensing surrounding environment structures, and detecting objects/obstacles. 
Roughly, the sensors used for surrounding environment perception are divided 
into two categories: active and passive ones. Active sensors include lidar, radar, 
ultrasonic and radio, while the commonly-used passive sensors are infrared and 
visual cameras. Different sensors are capable of providing different detection 
precision and range, and yielding different effects on environment. That is, com- 
bining various sensors could cover not only short-range but also long-range ob- 
jects/obstacles, and also work in various weather conditions. Furthermore, the 
original data of different sensors can be fused in low-level fusion, high-level fu- 
sion, and hybrid fusion [4, 14, 20, 35]. 

2. Dynamic Environment Modeling 

Dynamic environment modeling based on moving on-vehicle cameras plays 
an important role in intelligent vehicles [17]. However, this is extremely chal- 
lenging due to the combined effects of ego-motion, blur, light changing. There- 
fore, traditional methods for gradual illumination change, small motion objects 
[28] such as background subtraction, do not work well any more, even those 
that have been widely used in surveillance applications. Consequently, more and 
more approaches try to handle these issues [2, 17]. Unfortunately, it is still an 
open problem to reliably model and update background. 

To select different driving strategies, several broad scenarios are usually con- 
sidered in path planning and decision-making, when navigating roads, intersec- 
tions, parking lots, jammed intersections. Hence, scenario estimators are helpful 
for further decision-making, which is commonly used in the Urban Challenge. 

3. Object Detection and Tracking 

In general, in a driving environment, we are interested in static/dynamic ob- 
stacles, lane markings, traffic signs, vehicles, and pedestrians. Correspondingly, 
object detection and tracking are the key parts of environment perception and 
modeling. 


1.2.2 Vehicle Localization and Map Building 


The goal of vehicle localization and map building is to generate a global map by 
combining the environment model, a local map and global information. In au- 
tonomous driving, vehicle localization is either to estimate road geometry or to 
localize the vehicle relative to roads under the conditions of known maps or un- 
known maps. Hence, vehicle localization refers to road shape estimation, position 
filtering, transforming the vehicle pose into a coordinate frame. For vehicle localiza- 
tion, we face several challenges as follows: (1) Usually, the absolute positions from 
GPS/DGPS and its variants are insufficient due to signal transmission; (11) The path 
planning and decision-making module needs more than just the vehicle absolute 
position as input; (111) Sensor noises greatly affect the accuracy of vehicle localiza- 
tion. Regarding the first issue, though the GPS and its variants have been widely 
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Fig. 1.3 The framework of vehicle localization and navigation 


used in vehicle localization, its performance could degrade due to signal blockages 
and reflections of buildings and trees. In the worst case, Inertia Navigation Sys- 
tem (INS) can maintain a position solution. As for the second issue, local maps 
fusing laser, radar, and vision data with vehicle states are used to locate and track 
both static/dynamic obstacles and lanes. Furthermore, global maps could contain 
lane geometric information, lane makings, step signs, parking lots, check points and 
provide global environment information. Referring to the third issue, various noise 
modules are considered to reduce localization error [26]. 

Map building using various sensors has been addressed by many researchers [18, 
22], and it needs to yield the interpretation for the sensor information. Intelligent 
vehicles could be navigated under the conditions of either known maps or unknown 
maps. For example, the DARPA Grand Challenge provided the Route Network Def- 
inition File (RNDF), which belongs to the case of known maps. However, in ex- 
ploring Mars, intelligent vehicles could not have the maps of Mars beforehand. This 
problem is formulated as localizing vehicles traveling in an unknown environment. 
In this problem, we will handle the dual task of localizing the vehicle and simul- 
taneously modeling the environment, a.k.a., Simultaneous Localization and Map 
Building (SLAM) [8]. Figure 1.3 illustrates the framework of vehicle localization in 
an iterated way. 


1.2.3 Path Planning and Decision-Making 


For the purpose of safe and energy saving navigation, vehicles try to find an opti- 
mal path in 2D/3D road space from the initial position to the target position avoid- 
ing both static and dynamic obstacle collisions. Hence, global path planning is to 
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find the fastest and safest way to get from the initial position to the goal position, 
while local path planning is to avoid obstacles for safe navigation [6, 11]. Decision- 
making consists of mission planning and behavioral reasoning. When a vehicle au- 
tonomously navigates through the environment, the mission planner incorporates 
the new observation, thus updating the local maps. Afterwards, the mission plan- 
ner generates a new rule. The behavioral planner implements behavioral reasoning 
and the rule generated by the mission planner. Hence, those functions consist of 
road following, making lane-changes, parking, obstacle avoidance, recovering from 
abnormal conditions. In many cases, decision-making depends of context driving, 
especially in driver assistance systems [10]. 


1.2.4 Low-Level Motion Control 


The problem of investigating vehicle lateral and longitudinal control has stimulated 
significant research work in the last two decades. Its typical applications consist 
of automatic vehicle following/platoon [12, 29], Adaptive Cruise Control (ACC) 
[25, 33], lane following [21]. Vehicle control can be broadly divided into two cat- 
egories: lateral control and longitudinal control [19] (Fig. 1.4). The longitudinal 
control [29] is related to distance-velocity control between vehicles for safety and 
comfort purposes. Here some assumptions are made about the state of vehicles and 
the parameters of models, such as in the PATH project [12]. The lateral control is 
to maintain the vehicle's position in the lane center, and it can be used for vehicle 
guidance assistance [15, 34]. Moreover, it is well known that the lateral and lon- 
gitudinal dynamics of a vehicle are coupled in a combined lateral and longitudinal 
control, where the coupling degree is a function of the tire and vehicle parameters 
[24, 34]. In general, there are two different approaches to design vehicle controllers. 
One way to do this is to mimic driver operations, and the other is based on vehicle 
dynamic models and control strategies. 


1.3 The Organization of This Book 


This book consists of four parts. The first part is a basic introduction about in- 
telligent vehicles. Furthermore, Chapter 1 introduces the research motivation and 
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purpose, the key technologies. Chapter 2 presents the state-of-the-art of intelligent 
vehicles in the USA. Chapter 3 introduces the proposed basic framework of in- 
telligent vehicles. The second part presents environment perception and modeling. 
Chapter 4 presents road detection and tracking algorithms for structured and un- 
structured roads. Chapter 5 presents on-road vehicle detection and tracking algo- 
rithms using Boosted Gabor Features. Chapter 6 introduces a multiple-sensor based 
multiple-object tracking approach. The third part is about vehicle localization and 
navigation. Chapter 7 introduces an integrated DGPS/IMU positioning approach. 
Chapter 8 presents a vehicle navigation approach using global views. The final part 
is about advanced vehicle motion control. In Chapter 9, a lateral control approach is 
introduced. In the final chapter, a longitudinal control approach is presented. 
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Chapter 2 
The State-of-the-Art in the USA 


2.1 Introduction 


The field of intelligent vehicles is rapidly growing all over the world, both in the 
diversity of applications and research [3, 8, 18]. Especially in the U.S., government 
agencies, universities, and companies working on this hope to develop autonomous 
driving entirely or in part for safety and for saving more energy. Many previous 
technologies, such as seat belts, air bags, work only after a traffic accident. Only in- 
telligent vehicles can stop traffic accidents from happening in the first place. There- 
fore, DARPA has organized the Grand Challenges and the Urban Challenge from 
2004 to 2007, which remarkably promoted the technologies of intelligent vehicles 
around the world. Hence, this chapter presents an overview of the most advanced 
intelligent vehicle projects which once attended either the Grand Challenges or the 
Urban Challenge supported by the DARPA in the USA. 


2.2 Carnegie Mellon University—Boss 


The research groups at Carnegie Mellon University had developed the Navlab se- 
ries [8, 17], from Navlab 1 to 11, which include robot cars, tracks, and buses. The 
Navlab's applications have included Supervised Classification Applied to Road Fol- 
lowing (SCARF) [6, 7], Yet Another Road Following (YARF) [12], Autonomous 
Land Vehicle In a Neural Net (ALVINN) [11], Rapidly Adapting Lateral Posi- 
tion Handler (RALPH) system [16]. In addition, Sandstorm is an autonomous ve- 
hicle which was modified from the High Mobility Multipurpose Wheeled Vehicle 
(HMMW V) and competed in the DARPA Grand Challenge in 2005. The Highlander 
is another autonomous vehicle modified from HMMWV H1 which competed in 
same competition in 2005. 

Nevertheless, the latest intelligent vehicle is the Boss system (shown in Fig. 2.1) 
which won the first place in 2007 Grand Challenge [18]. Boss combines various 
active and passive sensors to provide faster and safer autonomous driving in an urban 
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Fig. 2.1 The intelligent vehicle, named Boss, developed by Carnegie Mellon University's Red 
Team (published courtesy of Carnegie Mellon University) 


environment. Active sensors include lidar and radar, and passive sensors include the 
Point Grey high-dynamic-range camera. The following functional modules were 
implemented on the Boss vehicle: 


]. Environment perception: Basically, the perception module provides a list of 
tracked moving objects, static obstacles in a regular grid, and vehicle localization 
relative to roads, road shape, etc. Furthermore, this module consists of four sub- 
systems, moving obstacle detection and tracking, static obstacle detection and 
tracking, roadmap localization, and road shape estimation. 

2. A three-layer planning system consisting of mission, behavioral, and motion 
planning is used to drive in urban environments. Mission planning is to detect 
obstacles and plan new route to its goal. Here, given Road Network Definition 
File (RNDF) encoding environment connectivity, a cost graph guides vehicles to 
travel on a road/lane planned by the behavioral subsystem. A value function is 
calculated to both provide the path from each way point to target way point, and 
allow the navigation system to respond when an error occurs. Furthermore, Boss 
is capable of planning another route if there is a blockage. 


The behavioral subsystem is in charge of executing the rules generated by the mis- 
sion planning. In details, this subsystem makes decisions on lane-change, prece- 
dence, and safety decisions on different driving contexts, such as roads, intersec- 
tions. Furthermore, this subsystem needs to complete the tasks, including carrying 
out the rules generated by the previous mission planner, responding to abnormal 
conditions, and identifying driving contexts, roads, interactions, and zones. Further- 
more, these driving contexts correspond to different behavior strategies consisting 
of lane driving, intersection handling, and achieving a zone pose. The third layer 
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Fig. 2.2 The Stanford University’s intelligent vehicle Junior that was the runner-up in the 2007 
DARPA Urban Challenge (published courtesy of Stanford University) 


of the planning system is the motion planning subsystem which consists of trajec- 
tory generation, on-road navigation, and zone navigation. This layer is responsible 
for executing the current motion goal from the behavior subsystem. In general, this 
subsystem generates a path towards the target, and tracks the path. 


2.3 Stanford University—Junior 


The Stanford University’s research team on intelligent vehicles has been one of 
the most experienced and successful research labs in the world. To better study 
and promote the applications of autonomous intelligent vehicles, the Volkswagen 
group founded the Volkswagen Automotive Innovation Laboratory (VAIL). Until 
now, Stanford University collaborated with the Volkswagen Group and built several 
intelligent vehicles, the Stanley (the autonomous Volkswagen Touareg that won the 
DARPA Grand Challenge in 2005 [10]), Junior (the autonomous Volkswagen Passat 
that was the runner-up in 2007 DARPA Urban Challenge [14]). Moreover, Google 
has licensed the sensing technology from Stanley to map out 3D digital cities all over 
the world. We will introduce Junior that participated in the 2007 Urban Challenge 
below. 

Junior [14], shown in Fig. 2.2, is a modified 2006 Volkswagen Passat wagon, 
equipped with five laser range finders, a GPS/INS, five radars, two Intel quad core 
computer systems, and a custom drive-by-wire interface. Hence, this vehicle is ca- 
pable of detecting an obstacle up to 120 m away. 

Junior’s software architecture is designed as a data-driven pipeline and consists 
of five modules: 
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e Sensor interface: This interface provides data for other modules. 

e Perception modules: These modules segment sensor data into moving vehicles 
and static obstacles, and also provide accurate position relative to the digital map 
of the environment. 

e Navigation modules: These modules consist of motion planners, a hierarchical 
finite state machine, and generate the behavior of the vehicle. 

e Drive-by-wire interface: This interface receives the control commands from navi- 
gation modules, and enables the control of throttles, brakes, steering wheels, gear 
shifting, turn signals, and emergency brake. 

e Global services: The system can provide logging, time stamping, message- 
passing support, and watch-dog functions to keep the system running reliably. 


Furthermore, we introduce three fundamental modules: environment perception, 
precision localization, and navigation. In the perception module, there are two basic 
functions, static/dynamic obstacle detection and tracking, RNDF localization and 
update, where lasers implement primary scanning, and a radar system works as an 
early warning for moving objects in intersections as complement. After perceiving 
traffic environment, Junior estimates a local alignment between a digital map in the 
RNDF form and its current position from local sensors. In navigation module, the 
first task is to plan global paths, where there are two navigation cases, road navi- 
gation and free-style navigation. However, basic navigation modules do not include 
intersections. Furthermore, Junior strives to prevent itself from getting stuck in be- 
havior hierarchy. 

Nowadays, researchers at Stanford University are still working on autonomous 
parking in tight parking spots! and autonomous valet parking. 


2.4 Virginia Polytechnic Institute and State University— Odin 


The team VictorTango formed by Virginia Tech and TORC Technologies developed 
Odin? [2], which took the third place in 2004 DARPA Grand Challenge. The Odin 
consists of three main parts: base vehicle body, perception, and planning. 

Now, we introduce the base vehicle platform. Odin is a modified 2005 Hybrid 
Ford Escape, shown in Fig. 2.3. Its main computing platform is a pair of HP servers, 
each with two quad-core processors. 

In the perception module, there are three submodules: object classification, lo- 
calization, and road detection. Here, object classification first detects obstacles and 
then classifies them as either static or dynamic. The localization submodule yields 
the vehicle position and direction in the 3D world. The road detection submodule 
extracts a road coverage map and lane position. 

The planning module uses a Hybrid Deliberative-Reactive model, which con- 
sists of upper level decisions and lower level reactions as separate components. The 


'http://cs.stanford.edu/group/roadrunner/. 


*http://www.me.vt.edu/urbanchallenge/. 
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Fig. 2.3 The intelligent vehicle Odin developed by the Team VictorTango (published courtesy of 
Virginia Polytechnic Institute and State University) 


coarsest level of planning is the route planner responsible for road segments and 
zones the vehicle should travel in. The driving behavior component takes care of 
obeying road rules. Motion planning is in charge of translating control commands 
into actuator control signals. 


2.5 Massachusetts Institute of Technology—Talos 


Team MIT has developed an urban autonomous vehicle, called Talos? (shown in 
Fig. 2.4) [1, 9, 13]. There are three key novel features: (1) perception-based nav- 
igation strategy; (ii) a unified planning and control architecture; (iii) a powerful 
new software infrastructure. Moreover, this vehicle consists of various submodules: 
Road Paint Detector, Navigator, Lane Tracker, Driveability Map, Obstacle Detector, 
Motion Planner, Fast Vehicle Detector, Controller, Positioning Modules. The per- 
ception module includes obstacle detector, hazard detector and lane tracking sub- 
modules. Planning a control algorithm involves using a navigator, driveability map, 
motion planner, and a controller. The navigator plays an important role in mission- 
level behavior, and the rest of these submodules work together in a tight coupling to 
yield the desired motion control goal in complex driving conditions. 


?http://grandchallenge.mit.edu/. 
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Fig. 2.4 The intelligent vehicle Talos developed by the Team MIT (published courtesy of Mas- 
sachusetts Institute of Technology) 





Fig. 2.5 The intelligent vehicle Skynet developed by Team Cornell (published courtesy of Cornell 
University) 


2.6 Cornell University—Skynet 
Team Cornell’ s Skynet? is a modified Chevrolet Tahoe, shown in Fig. 2.5, and con- 
sists of two groups of sensors [15]. One group is used for sensing vehicle itself, and 


the other group (laser, radar and vision) is for sensing the environment. Thanks to 
the above sensors, Skynet is capable of providing real-time position, velocity, and 


*http://www.cornellracing.com/. 
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attitude for absolute positioning. Moreover, Skynet's local map including obstacle 
detection information is the map of local environment surrounding Skynet. In many 
cases, autonomous driving in complex scenes is more than basic obstacle avoid- 
ance. Hence, the vehicle-centric local map is not enough for absolute positioning. 
We need to estimate environment structures using posterior pose and track generator 
algorithms. 

Skynet is using the probabilistic representation of the environment to plan mis- 
sion paths within the context of the rule-based road network. One intelligent planner 
includes three primary layers: a behavioral layer, a tactical layer, and a operational 
layer. The goal of the behavior layer is to determine the fastest route to the next mis- 
sion point. When there exist state transitions in the behavior layer, the corresponding 
component of the tactical layer is executed. Among the four tactical components, the 
road tactical component is to seek a proper lane and to monitor other agents in the 
same and neighboring lanes. The intersection tactical component handles intersec- 
tion queuing behavior and safe merging. The zone tactical component takes care 
of basic navigation in unconstrained cases. The blockage tactical component im- 
plements obstacle detection and judging whether there are temporary traffic jams, 
and acts accordingly. The final layer is an operational layer which is in charge of 
converting local driving boundaries and a reference speed into actuators, steering 
wheels, throttles, and brakes. 


2.7 University of Pennsylvania and Lehigh University—Little 
Ben 


Little Ben? designed by the Ben Franklin Racing Team is a modified Toyota Prius 
with various sensors and computers for the 2007 DARPA Urban Challenge [4], 
shown in Fig. 2.6. Similar to other intelligent vehicles, Little Ben is equipped with 
various sensors, such as three LMS291, two SICK LDRS, and a Bumble bee stereo 
camera. The sensor array provides timely information about the surrounding envi- 
ronment, which is integrated into a dynamic map for environment perception and 
modeling. 

Little Ben's software framework consists of perception, planning, and control. 
Its perception module is responsible for providing static obstacles, moving vehicles, 
lane markings, and traversable ground. Little Ben's primary medium-to-long-range 
lidars are responsible for geometric obstacles and ground classification, road mak- 
ing extraction, and dynamic obstacle tracking. Moreover, the stereo vision system 
is used to detect close road makings. Once the perception module generates infor- 
mation about static obstacles, dynamic obstacles, and lane markings, the MapPlan 
module will update obstacles and lane marking likelihoods in a map centered at the 
current vehicle location. The mission and path planning consists of two stages. The 
first stage is to calculate the optional path by minimizing the mission time. The next 


>http://benfranklinracingteam.org/. 
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Fig. 2.6 The intelligent vehicle Little Ben (published courtesy of the University of Pennsylvania 
and Lehigh University) 


stage is to incorporate the dynamic map into new path planning. Afterwards, the 
path follower module is responsible for calculating the vehicle steering and throttle- 
brake commands to follow the desired trajectory. 


2.8 Oshkosh Truck Corporation—TerraMax 


The TerraMax Vehicle is a joint effort by Oshkosh Truck Corp., Rockwell Collins, 
and the University of Parma [5], and is shown in Fig. 2.7. In this vehicle, Rock- 
well Collins was in charge of the intelligent vehicle management system. Oshkosh 
Truck Corporation was working on project organization, system integration, low 
level control hardware, modeling and simulation support, and the vehicle, while the 
University of Parma provided the vision module. The most important feature is that 
this vehicle has big size (weighs around 30000 pounds, is 27 feet long, 8 feet wide, 
and 8 feet high), so it has to travel slowly. 

Considering dynamic analysis of its mechanical systems, TerraMax provides un- 
derbody, steer angles, and lateral stability information for control modules. The full 
vehicle model consists of suspensions, steering, chassis, and tires. A typical simula- 
tion method over 70 different obstacles is used to evaluate the underbody clearance, 
for better handling of different obstacles at low speeds. The steering simulation is 
used to allocate both the front and rear steering angles, when given a steering wheel 
input. In addition, constant-radius tests were used to evaluate the lateral stability of 
the truck. 

The intelligent Vehicle Management system (1V MS) developed by the Rockwell 
Collins is an interface between the vehicle systems and onboard sensors. Moreover, 


Óhttp:;//en.wikipedia.org/wiki/TerraMax. (vehicle). 
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Fig. 2.7 The intelligent vehicle TerraMax (published courtesy of the Oshkosh Truck Corporation) 


the 1VMS provides various autonomous functions, such as vehicle control, real time 
path planning, obstacle detection, behavior management, and navigation. 
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Chapter 3 
The Framework of Intelligent Vehicles 


3.1 Introduction 


The Asian Development Bank states: "In the five years 2000—2004, more than 
500,000 people were killed and around 2.6 million injured in road accidents in the 
People's Republic of China (PRC), equivalent to one death every 5 minutes—the 
highest rate in the world.” and estimates a yearly economic loss of $12.5 billion. 
Driver assistance and safety warning systems promise to provide partial solutions 
to these problems, and consequently many research efforts [1] aim at developing 
algorithms and building frameworks for them. 

Road situation analysis requires not only obstacle information at the current time, 
but also predicted obstacle information at a future time. Indeed, an experienced 
driver looks several seconds along the road and bases his actions on information so 
obtained. This previewing of the road is necessary to avoid accidents since vehicle 
dynamics limits the car in making speed or direction changes. 

I7DASW uses more than one kind of sensors: image sensors, lidar, and radar. 
No single sensor can provide input as complete, robust, and accurate as required by 
I DASW. Image sensors have some problems, such as low ability of sensing depth, 
higher computation burden than lidar and radar. Radar shows limited lateral spatial 
information because either it 1s not available at all, or the field of view 1s narrow, 
or the resolution is reduced at large distances. Although lidar has a wide view field 
solving part of the previous problems, there are other problems, such as low abil- 
ity of discrimination, clustering error, and recognition latency. These restrictions of 
the different sensor types explain the attention given to sensor fusion in research on 
object detection and tracking [1, 3], resulting a wide spectrum of promising applica- 
tions in assistance driving, including multi-sensor Adaptive Cruise Control (ACC), 
fusion of advanced ACC and lane keeping systems [5], and smart airbag systems. 

On the basis of the work [15, 16], we proposed a road safety situation and threat 
analysis algorithm and framework based on driver behavior and vehicle dynamics. 
In a current environment modeling phase, obstacles are detected and tracked by 
fusing various sensors depending on applications. In a future situation assessment, 
we use the position and size information of obstacles at the current time and vehicle 
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dynamics equation to predict the future road situation at the time k + 1. For lidar 
data, we distinguish the object types: static or moving objects, by estimating object 
speed. 

The remainder of this chapter as follows. Section 3.2 introduces the state-of-the- 
art related to road safety frameworks. In Sect. 3.3, we provide a detailed description 
of our interactive safety analysis framework. 


3.2 Related Work 


Road situation analysis for driver assistance and safety warning is an interdisci- 
plinary endeavor involving a lot of research fields, for instance, computer science, 
automobile engineering, cognitive science, and psychology, etc. It involves not only 
looking-in but also looking-out of a vehicle [12]. We classify these frameworks 
analyzing obstacle situation in a traffic scene into two categories. The first one is 
a current situation analysis framework which attempts to provide the vehicle and 
the driver with the obstacles' state in the current time. Generally, sensor fusion is 
used to estimate the current obstacles’ state [3, 4, 13]. The other one is obstacle 
situation prediction in the future [2]. To assess the future situation, many predic- 
tion approaches have been used, such as the Extended Kalman Filter (EKF), Monte 
Carlo method [2], and Bayesian network [10]. 

Real-time safety analysis in traffic involving driver, vehicle, traffic environment, 
and their interaction is a challenge for perception, modeling, and control. Several 
safety analysis frameworks have been proposed to address different aspects in a 
road situation [2, 4, 6, 12]. In [2], a Monte Carlo reasoning framework is to evalu- 
ate the probability of a future collision and use a Monte Carlo importance sampling 
for the approximation of a collision integral. The looking-in and looking-out frame- 
work proposed by M.M. Trivedi et al. is a system-oriented safer driving framework 
[12] which consists of driving ecology sensing, hierarchical context processing, and 
modeling of drivers, vehicles, and environment. They build the Human-Centered 
Intelligent Driving Support System (HC-IDSS) to emphasize the role of driver. In 
context of an earthwork vehicle, a distributed sensor network aims at processing 
data acquired by different sensors, integrating them, and producing an interpreta- 
tion of the environment observed [4], its main objectives of low-level and high-level 
data fusion are to obtain a rough and an accurate estimate of the number of ob- 
jects present in the observed scene and their 3D positions, respectively. In addition, 
intersection scenario analysis was done in the INTERSAFE project, showing the 
need of driver assistance systems for intersection safety [6], where two parallel ap- 
proaches, Top-Down-Approach and Down-Top-Approach, were realized. In these 
approaches, a dynamic risk assessment is done based on object tracking and classi- 
fication, and the intent of a driver. Consequently, potential conflicts with other road 
users can be reported only a few seconds in advance. 

This chapter proposes an integrated current and future safety situation analysis 
framework as general as possible, where we model not only the sensing phase, but 
also the control phase. In this framework, a speed estimation algorithm based on 
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Fig. 3.1 Interactive road situation analysis framework 


lidar data is used to distinguish two types of obstacles: static objects and moving 
objects. On the basis of the speed and type of obstacles, we form obstacle tracks 
only using a single sensor, and following that a track fusion approach is used to 
yield accurate and robust global tracks. We use camera to detect lanes and obstacles 
in a Regions of Interest (ROIs) generated by range sensors, such as vehicles and 
pedestrians. Combining the lane structure with obstacle tracks, we can model the 
traffic environment and assess road situation at both the current and a near future 
time. We will introduce multiple-sensor based multiple-object detection and track- 
ing module in Chap. 6. 


3.3 Interactive Safety Analysis Framework 


Many existing robotics technologies apply to intelligent assistance driving [14], 
however, much research work neglects the preview of a driver and driver response 
delay; moreover, the behavior of high speed vehicles differs greatly from other 
robots. For safe driving, a driver is in the center of the safety analysis [12], driver 
response delay together with other factors restricts the driving path of a vehicle. On 
the basis of these factors, we proposed an integrated interactive road safety analysis 
framework, where the system consists of the following modules: on-board sensor 
network, environment modeling and sensor fusion, vehicle ego-state and vehicle 
dynamics module, future situation assessment, decision-making agents, Human- 
Machine Interface (HMI), and a preview-following model based control module (see 
Fig. 3.1). In this framework, we consider a driver assistance system as a vehicle- 
driver-environment interactive closed-loop system; moreover, we focus on not only 
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the current situation but also the future situation by predicting the potential collision 
probability distribution. 

In our framework, on-board sensors provide the real-time information about 
drivers, traffic environment, and vehicles. How to configure these sensors is closely 
related to application domain. For example, in light of the requirement of multi- 
sensor ACC systems, maybe radar and camera are enough, but for pedestrian pro- 
tection systems, an infrared sensor is essential to robust detection under the vari- 
ous weather conditions. In general, the external sensors capture object appearance, 
range, and voice outside a vehicle, and interior sensors collect vehicle state, such as 
speed, acceleration, and steering angle. 

The main functions of environment modeling and sensor fusion are to sense ob- 
stacles, to recognize lane and traffic sign, and to fuse various sensors to model the 
environment. Lane detection is the problem of locating lane boundaries. We propose 
robust lane boundaries on a variety of different road types under a variety of illu- 
mination changes and shadowing by introducing an adaptive Randomized Hough 
Transform (RHT) [11]. For moving objects, such as pedestrians and vehicles, we 
use statistical background modeling techniques for the detection of such moving 
objects. For instance, we obtain the dynamic background model under the condi- 
tions of no passing vehicle and update the model when a passing vehicle enter into 
the field of view for the close cut-in and overtaking vehicle detection. 

At the situation assessment level, road safety situation in the future is assessed 
by combining traffic rules, vehicle dynamics, and environment prediction. Since 
the safety distance varies with the speed of a host vehicle, we adopt preview time 
rather than safety distance as the measurement of safety response. Hence, the safety 
response time is given by 

— d, t d, 4 d; 


I = ———_, (3.1) 
U 


where d, is the distance required to respond to the nearest object due to driver re- 
sponse delay, d, is the distance to slow down, d; is the safety distance between the 
host vehicle and obstacles, and v is the velocity of the host vehicle. 

Decision-making agents have two functions, one is to generate warning strategies 
for warning systems, such as route guide systems and the warning display device; 
the other is to make decisions about the expected path of action planning interfacing 
with actuators. A Preview Optimal Curvature (POC) model based on the Preview 
and Following Model (PFM) and driver behavior characteristic is utilized to control 
vehicle's velocity and direction, where the key problem is to establish fuzzy evalua- 
tion indexes and their membership functions that represent the front road geometry 
shape, traffic rules and driver behavior. For the details we refer to [7]. Here decision- 
making agents use the rigid kinematics and vehicle dynamics stable-state response 
properties to yield expected path, and then action planning updates the ideal path by 
using vehicle dynamics dynamic-state response properties. Their main interaction 
activities involve a driver operating a vehicle and a vehicle producing the lateral and 
longitudinal motion. 

Furthermore, this framework involves communication and HMI modules. Com- 
munication modules implement information sharing between vehicles and between 
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a vehicle and a base station. An HMI module present warning information from 
the decision and path planning module to the driver. The interaction between the 
vehicle and traffic environment mainly focuses on vehicle—vehicle communication 
and vehicle—base station communication. Vehicles in the future will be able to share 
the information about the environment to provide cooperative, convenient and safer 
driving. 

On the basis of the Preview—Follow theory proposed by [8, 9], we proposed a 
region-based Preview—Follow 


f? 
io | [ye(S,t +E) — y-(S,t +E) P(S, £) d£, (3.2) 
ty 


where p(S,&) is the prior probability distribution given the region S and the time 
increment €; ye(S,t) and y, (S, f) are the expected and the real drivable region at 
the time f, respectively; & is the time increment. For the curve-based preview model, 
(3.2) simplifies to 


t2 
r= | [yet +) — »« € +E ple) dé, (3.3) 
f 


where y, (t) and y, (t) are the expected and the real path at the time f, respectively. 
Considering the fixed curvature, we assume that the ideal path of a vehicle is 


2 
E+ =O 630-550. (34) 
For the optimum curvature control, we write the vehicle acceleration as 
t2 
yr (t) = EC = » f E* p(E) dé — a5.) [fea (3.5) 
f 
where 
t2 
Bp | £^ p(£) d£, (3.6) 
ty 
to g^ 
a= | MWA (3.7) 
ti 
t2 
yi(t) = | E? p(E)y.(t +) dt. (3.8) 
ti 


On the basis of the factors of road safety reasoning [2], we extend the factors of 
the future road situation given below. 


1. Traffic rules While a driver implements a driving activity, traffic rules have a 
potential effect on the expected path. Traffic rules on highways and urban roads 
ensure safe, comfortable, collision-free driving. To achieve these objectives, the 
driver and the on-board sensors must recognize the traffic signs. 

2. Vehicle dynamics The motion of a vehicle is restricted by vehicle dynamics and 
includes two factors: the internal factor involving the tires, steering systems, and 
acceleration/deceleration systems and the other external factor involving driver 
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instruction. In our framework, we consider it as a whole to affect the safe driving 
rather than look at each influencing sub-factor, and focus on vehicle stable-state 
and dynamic-state response properties. 


. Driver behavior The aim of I7DASW systems is to develop an automatic system 


that can fully or partly replace a professional and experienced driver. Clearly, 
driver behavior characteristic are inevitably involved in safe driving. Except for 
road conditions and vehicle mechanical failures, most of traffic accidents are 
caused by drivers, for instance, inappropriate speed, ignoring right of way, over- 
taking, following too close, etc. In real-life traffic scenes, driving behavior, such 
as driver response delay, deeply affects the driving operation. 


. Sensor uncertainty Sensor noise causes uncertainty in data. The incomplete in- 


formation is used to assess the scene, and modeling the background and dynamic 
object is a challenge given the incomplete and uncertain information. 


. Vehicle state Vehicle state includes position, velocity, acceleration, direction an- 


gle, yaw angle, etc. Yaw angle affects greatly the dynamic properties of a vehicle. 
Given the basic state parameters, we can generate a predicted path. 


I DASW systems have three major functions: 


I: 


Providing appropriate just-in-time information about the vehicle, driver, and traf- 
fic environment for safer and better driving. For example, Real-Time Traffic and 
Traveler Information (RTTI) aims at facilitating the access to public data and 
providing drivers with information about the traffic environment and the other 
vehicles. 

Safety warning and assistance systems. The system warns the driver proactively 
about a possible hazardous situation on the basis of the vehicle's current posi- 
tion, orientation, and speed, and the road situation; moreover, steps can be taken 
to control the vehicle when a person's vehicle is in a hazardous situation. The 
safety warning systems monitor the driving situation and provide the traffic sit- 
uation for drivers, for example, potential collision information, including route 
guide systems, Lane Change Decision Aid Systems (LCDAS), Traffic Imped- 
iment Warning Systems (TIWS), Forward Vehicle Collision Warning Systems 
(FVCWS), etc. Safety assistance systems use the warning information to gener- 
ate the expected path and control the vehicle directly. Typical systems are For- 
ward Collision Avoidance Assistance Systems (FCAAS), ACC systems, Low 
Speed Following Systems (LSFS), Stop & Go systems, etc. 


. In-vehicle safety protection device for drivers and passengers. Such a system 


can protect drivers and passengers from the impact between humans and vehicle 
bodies, for example, smart airbag systems. 
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Chapter 4 
Road Detection and Tracking 


4.1 Introduction 


Road detection and tracking are important tasks for many intelligent vehicle applica- 
tions, such as Lane Departure Warning (LDW) systems, Anti-sleep systems,! driver 
assistance and safety warning systems [49], autonomous driving [9]. Road detection 
means locating road boundaries without the prior knowledge of road geometry, and 
includes a few basic tasks, namely, road localization, calculating the position of a 
vehicle with respect to the road, while road tracking is to update the road parameters 
from previous road parameters. Video-based road detection and tracking keep draw- 
ing more and more attention to this subject since it has many advantages compared 
to active sensors. In general, there are two types of road detection and tracking ap- 
proaches: one is for structured roads with yellow or white lane markings, the other 
is for unstructured roads. 
The main advantages of video-based approach are as follows: 


e Vision sensors acquire data in a non-invasive way, thus not polluting the road 
environment. In other words, vision sensors do not interfere with each other when 
multiple intelligent vehicles are moving within the same area. By contrast, besides 
the problem of environment pollution, we have to carefully think about some 
typical problems of active sensors, such as the wide variation in reflection ratios 
caused by different reasons (such as obstacles shape or material), the need for 
the maximum signal level to comply with some safety rules, and the interference 
among active sensors of the same type. 

e In most of Intelligent Transportation Systems (ITS) and Intelligent Vehicle (IV) 
applications, vision sensors play a fundamental role, for example, in lane marking 
localization, traffic sign recognition, obstacle recognition. Among those applica- 
tions, other sensors, such as laser and radar, are only complementary to vision 
sensors. 

e We do not need to modify road infrastructures when using vision sensors to cap- 
ture visual information. This is extremely important in practice applications. 
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e Vision sensors can get visual information with high spatial and temporal resolu- 
tion about road environment. Whereas both radar and laser sensors have the same 
problems of low spatial and temporal resolutions. 


Hence, vision sensors possess key advantages over active ones, for foreseeing 
in massive and widespread applications on autonomous intelligent vehicles. At the 
same time, vision-based road detection and tracking approaches have a few limita- 
tions: 


e Vision sensors are less robust than laser sensors and radar sensors in extreme 
illumination conditions, such as fog, night, sunshine, rain. 

e Large amount of video/image data is a great challenge for embedded systems. As 
a result, specific computer architectures and parallel processing techniques are 
carefully considered to improve real-time performance. 

e Ona bright sunny day, various objects, such as trees, buildings, cars, bridges, and 
channels, could generate shadows, thus changing road color and textures. 

e On-road vehicles and obstacles could occlude part of a road, thus resulting in 
discontinuity of lane markings. 


In addition, to improve road detection and speed up the processing, some as- 
sumptions in road detection and tracking are made: 


e High contrast between lane markings and other part constraints: Apparently, the 
lane markings are highly contrasted by road backgrounds, which is the basic as- 
sumption for almost all lane markings based approaches. Here, different color 
spaces are used in these approaches. Southall et al. extract lane markings from 
the red channel of color images since lane markings are either yellow or white 
[43]. Li et al. proposed to transform the RGB color space into 71 /2/3 color space 
[29], and the J) = (R — B)/2 component is normalized to form a gray image, 
called the /> image. The advantages of this transform are twofold: First, the high 
correlation among the R, G, and B components is removed. Second, the new 
color space is more effective with respect to the quality of segmentation and the 
computational complexity. 

e The continuity of lane boundary and marking edges: The continuity of lane 
boundary and marking edges is another basic assumption. Though some lane 
markings are dashed, they are certainly continuous locally and could be linked 
to get a complete one. 

e Regions of Interest (ROI) assumptions: Instead of processing the whole image, 
lane detection and tracking algorithms focus on specific regions of interest only. 
In a current image frame, lane detection and tracking algorithms will seek ROIs 
using the results of previously processed frames or assuming prior knowledge 
on the road environment. In the lane detection integrated with lane tracking, the 
current lane parameters are predicted using the parameters of the previous frame 
and the vehicle dynamics, thus yielding search regions for the current detection 
[36]. 

e The fixed lane width assumption: The assumption of a fixed or smoothly vary- 
ing lane width allows enhancing the search criterion, thus limiting the search to 
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almost parallel lane markings. Furthermore, a lane marking feature extractor is 
based on the fact that lane marking width is in a small range of possible values on 
a road [25], which implies geometric constraints on the observed lane-marking 
width. 

e Road geometric assumptions: The reconstruction of road geometry can be sim- 
plified by assuming its shape. In general, different models could correspond to 
different applications; we have to consider carefully. The commonly-used road 
models are straight road models [46], curved road models (Clothoid lane mod- 
els [31, 43], parabola models [33], quadratic models [24]), 3D road models with 
horizontal and vertical curvature [26]. 


4.2 Related Work 


In this section, we will introduce the state-of-the-art of road detection and tracking. 


4.2.1 Model-Based Approaches 


Kluge et al. proposed a deformable template model of lane structures to locate 
lane boundaries without thresholding the intensity gradient information [37]. The 
Metropolis algorithm is used to maximize a function which evaluates how well the 
image gradient data supports a given set of template deformation parameters. Wang 
et al. presented a B-Snake which is capable to describe a wider range of lane struc- 
tures and is constructed by a set of control points [45]. Moreover, Minimum Mean 
Square Error (MMSE) is used to estimate the control points by the overall image 
forces on two lanes. Tie Liu et al. presented lane detection algorithm using a de- 
formable template and a Genetic Algorithm (GA) [30]. In this approach, the first 
step is to preprocess a road image using an edge operator to yield edges, and then 
fit a deformable template model of road edges or marked lines. Here, its likelihood 
function defines the fitting degree for a given template deformation parameters. Af- 
terwards a GA is used to search the global optimal solution of the likelihood func- 
tion, thus yielding the optimal parameters of the deformable road model. 


4.2.2 Multi-cue Fusion Based Approach 


Apostoloff et al. proposed using particle filtering and multi-cue fusion technolo- 
gies to robustly handle several lane detection and tracking issues, such as shadows 
on the road, unreliable lane markings, dramatic lighting changes, and discontinu- 
ous changes in road shapes and types [1]. Here, six cues are used in lane detection 
and tracking: lane markers, road edges, road colors, non-road colors, road width, 
and elastic lanes. Gern et al. presented a new approach fusing two different types 
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of road features, white lane markings and horizontal optical flow [21]. First of all, 
a clothoidal lane geometrical approach is used to locally track the white markings. 
Second, inspired by human behavior when a driver is under adverse weather condi- 
tions, the horizontal optical flow is calculated to track the motion of all road parallel 
structures. As a result, this leads to a precise road position estimation and vehicle 
position relative to lanes, even under adverse weather conditions. 


4.2.3 Hypothesis- Validation Based Approaches 


Pomerleau et al. proposed determining the curvature of the road ahead using a 'Hy- 
pothesize and Validate' strategy [40]. The RALPH first hypothesizes road models 
with different curvatures, then subtracts these curvatures from the low-resolution 
image, and finally validates which hypothesized curvature matches well the original 
image. 


4.2.4 Neural Network Based Approaches 


In most prototypes of autonomous vehicles developed worldwide, road recognition 
and vehicle driving are two separate modules. However, some early systems were 
not based on the preliminary road detection, but obtained driving commands di- 
rectly from road images. For example, Autonomous Land Vehicle in a Neural Net 
(ALVINN) is the Carnegie Mellon University's intelligent vehicle which consists of 
a single hidden layer back-propagation network [39]. Here, its input layer of NN is 
a 30 x 32 two-dimensional video frame and the output layer of NN is a linear rep- 
resentation of the travel directions of the vehicle so that the vehicle can be kept on 
the road. After training, the vehicle can autonomously follow the road. This system 
had driven on various road types under different conditions. 


4.2.5 Stereo-Based Approaches 


Stereo vision algorithms can relax the common assumptions about a road: flat road 
surfaces, constant pitch angles. Furthermore, depth information allows separating 
road from obstacles. S. Nedevschi et al. modeled lanes as a 3D surface, defined 
by the vertical and horizontal clothoid curves, the lane width, and the roll angle. 
Also, the lane detection is integrated into a tracking process. In addition, the Generic 
Obstacle and Lane Detection (GOLD) system is based on a stereo vision hardware 
and software architecture [4, 8, 35], where the Inverse Perspective Mapping (IPM) 
over both left and right stereo images is used to remove the perspective effect. 
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4.2.6 Temporal Correlation Based Approaches 


When a road following algorithm is aimed at not only lane detection but also lane 
tracking, the temporal correlation between consecutive frames can be used either to 
ease the feature determination or to validate the result of the processing. Redmill 
et al. developed an image-based lane tracking system [41] in which the geometry 
and width of the current lane ahead of the vehicle are estimated frame by frame. 
Afterwards, the position and orientation of the vehicle are estimated w.r.t. the center 
line between the two lanes. 


4.2.7 Image Filtering Based Approaches 


McCall and Trivedi proposed a method for lane detection using steerable filters [32]. 
Steerable filters are robust with respect to lighting changes and shadows and work 
well in extracting both circular road markings as well as painted road markings. 
In addition, road/lane segmentation and obstacle detection in a dynamic scene as a 
part of the European PROMETHEUS project consist of a temporal filter, an edge 
detector, and a watershed transformation [6]. Here, the morphological ‘watershed’ 
transformation is used to locate the lane edges in the gradient images. 


4.3 Lane Detection Using Adaptive Random Hough Transform 


4.3.1 The Lane Shape Model 


For structured road, lane models play an important role in lane detection, where 
some assumptions are made to better recovery 3D lanes from 2D images. In this 
section, we assume that the two lane marks are parallel lines and also concentric 
circular arcs on a flat ground plane. Let a pixel (u, v) in an image plane correspond 
to a point (x, y) on the ground plane. Hence, a circular arc with curvature k is 
approximated by a parabola of the form 


1 
aes ; +my +b, (4.1) 
where b is the offset of the arc on the ground plane. These circular arcs on the 
ground plane are projected into curves in the image plane. These curves can be 
closely approximated in the image plane by [28] 


/ 


+ b'(v — hz) + uo, (4.2) 





u = 
v — Nz 


where k’ = ak; b' is related to b, arc curvature k’, and the camera tilt angle; uo is a 
function of the tangent of the arc on the ground plane and the camera tilt. 
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Taking the derivative of (4.2), we have 





H= du x k’ (4.3) 
= dv (v—hz)? l 
Then (4.2) can be represented by 
Se eds (4.4) 
u = — (v — V5: 
v—h, dv í E 


Now we transform the 3D parametric space of k’, b’, vp into the 2D parametric 
space of k’, vp, thus reducing computational complexity and storage requirements. 
Furthermore, both k’ and v, are the same for all lane shape features, whereas b’ 
is feature specific. In other words, among these lane shape features, lane edges ap- 
proximately share k’ and vp. The difference between them is the value of the param- 
eter b. This allows us to estimate k’ and vp robustly and quickly by Random Hough 
Transform (RHT). Therefore, both k’ and v; can be estimated directly from the raw 
edge point location and orientation without grouping the edge points together into 
individual features. 

Given two pixels, (u1, v1) and (u2, v2), sampled in the gray-edge map, k’ and vp 
can be calculated as 

y 1 Quum g Qn v2) 
2 (vy hg)? — (3h) ^" (4.5) 
Up =u, — 2-4 — S (vi — hy). 


Finally, we formulate lane detection as estimating k’ and b’ for the left and the 
right lane in road images. 


4.3.2 The Adaptive Random Hough Transform 


Hough Transform (HT) is a classic parameter estimation approach, which is widely 
used in lane detection [17, 22, 34, 47]. Since image features can be used indepen- 
dently, the HT is suitable for implementing in a parallel computing system. Rallard 
et al. generalized the HT to detect arbitrary shapes under a geometric transform [3]. 
However, increasing the number, range, and accuracy of the parameters may result 
in high computing complexity. In line detection, Illingworth et al. proposed imple- 
menting the HT efficiently by an adaptive accumulator array and a coarse-to-fine 
strategy. The advantage of this approach is that it can yield a solution until a given 
accuracy without increasing array size. In addition, a two-step adaptive generalized 
HT for the detection of non-analytic objects under weak affine transformations was 
introduced in [16]. 

RHT can improve efficiency in the detection of an analytic curve edge map, de- 
termining the n parameters of the curve of interest. By contrast, the RHT has higher 


5ris.cn 000000 





4.3 Lane Detection Using Adaptive Random Hough Transform 30 


Fig. 4.4 The ARTH 
algorithm flow 
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parameter accuracy, larger scope of the parameter space, smaller storage require- 
ments, and higher speed. However, the procedure is repeated to combine the dis- 
crete parameter values. Thereby, the wide range and high accuracy of the parameters 
could yield remarkably large computing burden and storage space requirements. 

Motivated by these problems, we present a new Adaptive RHT (ARHT) for lane 
detection, which combines the advantage of both the AHT and RHT. Figure 4.1 is 
the algorithm flow of the ARHT that illustrates the implementation strategy to detect 
the lane markings in a road image. 


A. Pixel Sampling on Edges The task of lane detection can be conducted on 
a binary edge image usually from grey-level images by either simple thresholding 
operations or by some standard edge detection techniques. 

However, when using gradient operator to obtain edges, it is difficult to determine 
an optimal threshold for selecting only true road lanes corresponding to the painted 
yellow and white lane marks or road boundaries among many noisy edges. Also, in 
many road scenes it is not possible to select a suitable threshold that eliminates noise 
edges without eliminating many of the lane edge points of interest. In lane detection, 
curves often disappear during the edge detection processes, and this becomes critical 
for the succeeding processes. Therefore, a better alternative is to use the whole gray 
edge map while no useful information is lost. However, computational cost is high if 
all pixels are included, a very low threshold value is assumed to insure the existence 
of true edges corresponding to road boundaries, and the remaining problem is the 
selection of the real boundaries among many candidate edges. For the gray edge 
magnitude map mentioned above, a very low threshold (e.g., 0.1 or 0.2) is set to 
remove those points which do not belong to lane markings, thus keeping low false 
negative rate. 

In RHT based on a binary edge map, every edge pixel is sampled uniformly, 
without considering its probability of being on a certain curve. In the ARHT, in- 
spired by particle filtering [2], pixels in the gray edge map are weighted according 
to its gradient magnitude, and then the pixels are sampled according to their weight, 
i.e., pixels with larger gradient magnitudes are sampled more frequently. Here, the 
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weight of a pixel with index n (n =0, 1,..., N — 1) is defined as 


mn u, v) 
wi) u, v) = VSW-6sH-1 pn-1);, 4" i) 
where fm(u, v) = E is the gradient magnitude, and X w”) — 1]. 


The pixels are sampled as follows: 


(a) Form a sample-set D using pixels having nonzero gradient magnitude; 

(b) Calculate the weight w" (u, v) as defined in (4.6). 

(c) Store everything together including the cumulative probability as (d? , w™, 
Cc), where CÓ — 0, CM =C) +w, n 20,1,..., N — 1. 

(d) Generate a random number r € [0, 1]. 

(e) Find the smallest q for which C > r by binary subdivision. 


We select q elements of D to be the sampled pixels. Afterwards, we estimate the 
parameters k and vy; the correctness of the parameters should also be verified. Since 
we cannot get lane shape from the parameters k and v; only, parameter b has to be 
calculated. The parameter b of the model can be found by forming a histogram of 
the accumulation of gradient magnitude of points on the curve supposed to be true 
in the gray level edge image. The pixels throughout the curve can be accumulated 
as 


M= 5, fmc, Ve), (4.7) 


ceCurve 


where M represents the length of the curve determined by k’. If M exceeds a speci- 
fied threshold, the curve is true. 

Once a marking is detected, the other marking can be obtained by some post 
analysis, such as a simple histogram step. 


B. Multi-Resolution Parameter Estimating Strategy A multi-resolution strat- 
egy 1s used for both achieving a cumulative solution rapidly and reducing computing 
complexity. Now we build a Gaussian pyramid, where each level /; is smoothed by 
a symmetric Gaussian kernel and resampled to obtain the next level 7+1 by [19]: 


41 = S (Go * I), Io — I, (4.8) 


where / is the original image, Go is a Gaussian kernel with bandwidth o, S, (-) is 
a downsampling operator. Figure 4.2 shows a multi-resolution image representation 
for lane detection. 

Now we can roughly and efficiently locate the global optimum using the ARHT 
with a fixed accuracy. The parameters from the previous pyramid level are the ini- 
tial parameters of the ARHT for estimating more accurate ones. By doing so, we 
constrain the parameter search to a smaller range around the previous solution, thus 
reducing computing complexity and storage space. This coarse-to-fine strategy of- 
fers us an acceptable solution at an affordable computational cost, and thus speeds 
up the lane detection. 
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Fig. 4.2 Gaussian pyramid of road images with resolution from 256 x 240 to 64 x 60 


In the Gaussian pyramid, the parameter relationships between the two consecu- 
tive pyramid levels are 


k, = 4k, 4, Up 1 = 2Up,1--1; b, — b, +1. (4.9) 


Now, we discuss the error criterion regarding the elements in the ARHT. Usu- 
ally, we consider two elements the same if they have the same coordinates. One 
alternative is that two elements are same if the distance between them is smaller 
than a given tolerance e. The smaller the tolerance, the higher the parameter ac- 
curacy when using the ARHT. Although € is fixed in our approach, the parameter 
accuracy is still improved due to the multi-resolution image representation. 


4.3.3 Experimental Results 


This section presents the performance of the proposed method for the real road 
scenes. We can extract the left lane boundary and the right lane boundary. The algo- 
rithm is tested on some images from both the video grabbed by an on-board camera 
in our lab and the images provided by Robotics Institute of CMU.’ All experimen- 
tal images are 24-bit color images of size 256 x 240. Figure 4.3 shows some of our 
experimental results of lane boundary detection, where detected boundaries are su- 
perimposed onto the original images. These images represent various real highways 
scenes, including a lane whose left and right markings are solid, a lane whose left 
marking is solid and right marking is broken, a lane whose left marking is broken 
and right marking is solid, a lane whose left and right markings are broken, a lane 
with shadows, a lane with a highlight in the far field, a lane whose left marking has 
a big blank, also a lane whose markings are fragmentary. Experiment results show 
that the method retains the desirable HT characteristics of robustness to extrane- 
ous data and the ability to detect model parameters from disturbed data, although 
imperfect detection occasionally happens because of traffic signs drawn on the road. 


?http://vasc.ri.cmu.edu/idb/html/road/index.html. 
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Fig. 4.3 Experimental results on different road scenes 





Fig. 4.4 Experimental comparison of a genetic algorithm and the ARHT for lane detection 


Figure demonstrates the performance of genetic algorithm based lane detec- 
tion [10] and ARHT based lane detection. The experimental comparison indicates 
that the latter has some advantages over the former. 
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4.4 Lane Tracking 


4.4.1 Particle Filtering 


In principle, particle filtering is a sequential Bayes filtering approach, a.k.a., se- 
quential Monte Carlo filtering [15], which is widely used in lane tracking [38, 
43]. Let Zy = {z0, Z1,..., Zk} denote the measurement before time k, and S; = 
[50, 51. ..., S] denote the states before time k. 

To better understand the particle filter, we briefly review Bayes' filtering. The 
Bayes' rule is 


P(z|s) x P(s) Likelihood x Prior 
POE) = eee (4.10) 
P(z) Evidence 


This equation indicates how we compute the posterior probability from the likeli- 
hood and the prior probability. In this Bayes' framework, we can determine s by 
finding the most probable values of s given data z. This technique is called Maxi- 
mizing A Posterior (MAP). When P(s) is a constant for any value of s, MAP can 
be simplified to a Maximum Likelihood Estimation (MLE). 

Furthermore, the recursive Bayes’ filtering consists of two steps: the prediction 
step and the updating step. In the prediction step, we calculate the value of s at time 
k according to a dynamic system model and a previous posterior probability at time 
k —1 by 


POZ) = | pGusi-opGialzi o ds (4.11) 


where p(skķ|Sk—-1) is a probability density function (pdf) of a dynamic model. 
Afterwards, the updating step calculates the P (sg|Zķ) given the likelihood and 
pGx|Zk—1) by 
P (zk|sk, Zx—1) x P(sk|Zk-1) 
P(X |Z) = (4.12) 
P (zk|Zk—1) 

where P (zkz|Sk, Zy—1) is a measurement model, P(s&|Zi—1) is a prior model, and 
P (zyk|Zk—1) is a constant and can be represented by 


p(x|Zx-1) = | peso psz dse: (4.13) 


Now we turn to sampling algorithms to find the representation of the posterior 
probability. That is, we sample from the posterior distribution with some discrete 
and weighted particles the posterior distribution 


1 N 


BGHZQ) = 5 d sk — 5’), (4.14) 
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where the (sia S N are independent identically distributed (i.1.d.). Let 59.4 = 
{so, 51,..., Sk}, and we can get any estimate of the form f (59.4) approximately by 


E[ f (sox) | ey f (o1) P Gol Zi) dso = — X (soy). (415) 


Unfortunately, it is often not possible to sample directly from the posterior dis- 
tribution. Hence, we sample from a distribution q (so:k|Zķ) which can be sampled 
easier, called the Proposal Distribution. Then we have 


| f (s0:«) | = | fos Pa (S0:k| Zk) d'so:xk 


= | Foon PU lsou) p (sox) 


Al Zip) dSok. 4.16 
PZ Corl Z q (S0:k| Zk) d'S0:k (4.16) 


Let wi (59.4) = PEDI POS. Since the probability p(Z;) is independent of so.x, 


the estimate can be expressed as follows: 


1 
E| f (sox) |] = Zo | f ou) Wk (90:4) 9 ox] Zk) d'sk 


» f f Gou) we Gou)q (S0:k| Zk) d so 


zz 
f PZilsou) p Goa) LATE dso 


f wx Gou24 Go: Zk) d'so:x 





E E (so4|lZi) [wx (s0:k) f (S0:k)] (4 17) 


Eso 4| zi) [Wk Gou)] 


Now we can estimate approximately by directly sampling from the proposal distri- 
bution q (so:k| Zk) and get 


13 EG TG 


E| f (so) | © 
| P ly pwiGQD 
N " ‘ 
e V ds (s) f (Son) (4.18) 
i=l 
where 
(i) 
To ea (4.19) 


N 
os , wy GE). 


The above procedure is called Bayesian Importance Sampling. 
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Since q(sox|Zx) = q (sx|S0:k-1, Zk)q (So:k—1|Zk-1), we have 


| p(Zklsou)p(sox) Dxlso:x) P(S0:k) 
q(S0:k|Zk) q (S0:k| Zk, 50:k—1)4 Go:k—1| Zx—1) 
| P(Ze-1|80:k-1) P(So:k-1)  p(Zklsox) — P(S0:k l 


q(so:k—1|Zk—1) P(Zx-1|80:k-1) P(S0:k—-1) q Gol Zk, S0:k—1) 


yg, POHSOP GIA) "mn 


qsk|sk—1. Zk) 
Note that in the second row of (4.20), p(zx, Zx 1lso:) = p(Zk—-1l|50:k) P(Zk|S0:k) = 
p(Gk|sk) P(Zk-1|80:k-1), P(S0:k) = P(Sk|90:k-1) pGoa—1) = p(sk|Sk—1) P(S0:k-1). 
In practice, choosing the proposal distribution is important for a successful par- 
ticle filtering approach. Usually, we can take q(sy|Sy 1, Zk) = p(sk|sy—1). Hence, 
sequential importance sampling is as follows: 


Wk = Wk-1 P(ZkISK)- (4.21) 


This equation means that we can obtain the estimates of the importance weights 
in a recursive way under the constraint of Markov dynamic models. Moreover, the 
weight update is proportional to the likelihood when the proposal distribution is 
the prior system equation. Finally, we would like to point out that there are two 
fundamental assumptions of particle filtering: a first-order Markov process of states 
and observation models. 

We summarize the Basic Particle Filtering Algorithm as follows: 


1. Initialization: For k = 0, sample N particles st (i —0,..., N — 1) from p(so); 
2. Important Sampling: Sample Si from pGklst. 2. then evaluate the impor- 


tance weights wi. = P(zk lst), and normalize the importance weights m = 


w; / E Wi 

3. Re-sampling: According to the normalized importance weights Wh, re-sample 
with replacement N particles ae (i —0,..., N — 1) from the set CM (i = 
Q,..., N— 1); 

4. Then proceed to the Importance Sampling step, when the next measurement ar- 
rives. 


4.4.2 Lane Model 


In this approach, the lane tracking is formulated for the estimation of lane's pa- 
rameters and vehicle's states. In this section, we introduce the lane model and the 
dynamic system model. 
We represent the lane shape by a Taylor series expression of a clothoid [38, 43] 
d] 3 


a 
y(x) = yo + tan($) + ~ ES, (4.22) 
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Fig. 4.5 An illustration of a 
vehicle dynamic system 
model at different times 





where y is the lateral position of the road center relative to the vehicle, x is the 
longitudinal distance ahead, 6$ is the pitch of the camera relative to the road surface, 
ao and a, are the curvature and curvature rate of the lanes. 


4.4.3 Dynamic System Model 


The lane tracking algorithm is based on a 4D state vector s; 


sk = [vo $ (K), ao). ai(k)] . (4.23) 


Now we have the dynamic system model: 


yo(k -- 1) | Ax AX AX] Dye(k) 0 
o(k +1) 0 1 Ax £ || 6) — Ayk 

= 7 + ; 4.24 
a(k-D|^ lo 0 1 Ax | | ao 0 oe) 
a,(k + 1) 0 0 0 1 a(k) 0 


where Av; is the yaw rate at time k. The vehicle dynamic system model is shown 
in Fig. 4.5. 


4.4.4 The Imaging Model 


Now, we have to build the dynamic system model. However, only the mapping 
model between image coordinates and world coordinates can affect state changes 
as an observation. The relationships between image coordinates (u, v) and vehicle 
coordinates (x, y) are [43] 


u = Icy = EE MS z fi X — — - 
X, tt X cos@+H sing / 4 xcos@+H sing’ (4.25) 
p zu f u ace ie SL ^ H cos $—x sing i 
~ Xc4V" ^ XcosQ-4-H sing” Y x cos Q4-H sing ' 
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(a) (b) 


Fig. 4.6 Vehicle, road and image coordinate systems. The road y-axis points into the page: (a) Ge- 
ometric mapping between camera coordinates and world coordinates; (b) Image coordinates sys- 
tems 


Note that Ze denotes the red line in Fig. 4.6 and X, denotes the blue line. Moreover, 
the relationships between image coordinates and pixel coordinates are 


i — uo J — vo 
= 9 U = 9 
fu fv 
where (ug, vo) is the principal point of the camera, and f, and f, are the effective 
focal lengths of the camera in the 7 and j directions, respectively. The relationship 
among vehicle coordinates, camera coordinates, and image coordinates is illustrated 


in Fig. 4.6. 
The camera pitch @ used in (4.25) is calculated by 








(4.26) 


| HcosQ—xsinó | H/x—tanQ | 


Uy = eo FFT (4.27) 
xcoso + Hsing 1+AH/xtand 
when x —> oo, vp can be represented as follows: 
v, = — tano, (4.28) 
where the v; is the v coordinate on the horizontal line. 
Also, the camera height can be calculated by 
W COS $ 
HE (4.29) 
p 


where /; and I; are image gradient of the right and left lanes, respectively, w is the 
lane width. 

In this section, we need to calibrate camera intrinsic and external parameters: the 
principal point (uo, vo), focal length (fu, fy), and the pitch angle $ as described 
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before. We use Caltech's camera calibration toolbox for Matlab to obtain those pa- 
rameters [48].? 


4.4.5 The Algorithm Implementation 


The CONDENSATION algorithm [27] is used to estimate the shape of the road 
ahead of the vehicle. The basic idea is that the distribution is approximated by a set 
of N ‘particles’, pairs [s, œ], s is a state vector, and œw is a weight that reflects the 
plausibility of s as a representation of the true state of the system. In this section, 
we will introduce several basic issues about the lane tracking implementation based 
on particle filtering. 


4.4.5.1 Factored Sampling 


Let us first introduce the factorized sampling algorithm in order to represent succes- 
sive image observations. For non-Gaussian observations from image sequences, as 
we have mentioned before, lane tracking is formulated as a problem of estimating 
the parameters s(k). In this case, the posterior density represents all the knowledge 
about s from the observed data. From the Bayes' rule, we obtain 


p(s|z) = np(Z|s) p(s), (4.30) 


where 77 is a normalization factor; we are usually not able to calculate it simply in a 
closed form. Hence, iterative sampling techniques can be used. 

The factored sampling algorithm generates a random variate s from a proposal 
distribution. First, we generate particles ae "m j| from its prior probability 


p(s) and its weight c; according to the likelihood p(z|s7) at time k as follows: 


(n) 
Zkl$ 
TL O els ') (4.31) 


N-1 ) 
25 p(zelsy”) 
Now, a re-sampling step is used to generate a new particle set Bopi = 
0,..., N — 1} as follows: 


1. Generate a uniformly distributed random number r € [0, 1]; 
2. Seek the smallest 7 for which C; >r; 


3. Sets. — d , where c; is the accumulated weight of particle j at time k. 


We would like to point out that the higher weight a particle has, the more likely 
it will be sampled. The weight ok effects the occurrence probability of the corre- 
sponding particle s? from the observation. When N is sufficiently large, the samples 


3http://www.vision.caltech.edu/bouguetj/calib_doc/. 
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Fig. 4.7 The filtering result 
of a road image 





will approach the posterior density p(s;|zy). That is, when N is large enough, the 
weighed average of all the particles will approach the precise state. 


4.4.5.2 The Observation and Measure Models 


In lane tracking, we need to update the weights as in (4.31) using successive image 
observations. This procedure divides into two steps, generating observations and 
measuring the similarity between the extracted pixels and those from particles. 

In the observation step, we directly extract lane pixel positions as image obser- 
vations. Similar to [43], we extract lane markings from the red channel of our color 
images. Afterwards, a 3 x 9 filter kernel is used to extract lane marks 


1 2 5 7 9 7.53 1 
1 3 5797 53 LI (4.32) 
13579 753 1 


From Fig. 4.7, we can see that the lane marks are obviously better seen. 

Let us denote the pixel from lanes object pixels. Furthermore, we search the near- 
est neighbors of the predicted pixels within object pixels, thus yielding the distance 
between the predicted pixels and the resulting object pixels, shown in Fig. 4.8. In 
practice, for each predicted pixel set from the corresponding particle, we search its 
object pixels in yellow and gray search windows, respectively. Finally, we obtain 
the distance from each predicted pixel to its object pixel. 

After yielding the distance measures between the predicted pixels of each particle 
and the corresponding object pixels, we calculate the weights of each particle. Let 
pi(s;,) denote the ith predicted pixel from particle s? and let p;(s;) denote the 
corresponding object pixel. First, we sum the distances of particle sý 


am à d (v (st, pi (sz))). (4.33) 


l 


Then, we calculate the weight c; , of each particle using its sum of distances 


Ur vd. (4.34) 
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Fig. 4.8 The search of object 
pixels: the blue and red pixels 
represent those from different 


particles Ce | e 





Finally, we normalize the weight so that `, Dd — ]. Moreover, the cumulative 


1 n 
weights c; , , are 
n | ,n-l n 0 _ 


Here, a new particle set (5; eis Or. m Cr 4.1) is generated. 

According to the distances between each predicted pixel and its nearest neigh- 
bor within object pixels of each particle, the predicted pixels of each particle have 
different scores. Accumulating their scores generates the weight of each particle. 
Figure 4.9 (left) shows the weights of the predicted pixels. 

Once having the N particles, we calculate the state at time k + 1 by the weighed 
average of all the particles 


N-I 
Elseuil= o 5). (4.36) 
n=0 


Figure 4.9 (right) shows the estimated results. 


4.4.5.3 The Algorithm Flow 


We summarize the lane tracking algorithm based on particle filter as follows: 
Input: A particle set {s;,@;,c,}n=0,...,N—1 at time k, and the observed image at 
time k + 1; 
Iteration: (n =0,..., N — 1) 

1. Sample Selection: Select a sample $141 from the particle set based on particle 
weights (c; }. 

2. Prediction by the Dynamic Evolution Model: Equation (4.24) is used to 
predict a new particle s7, , from s% ,. 

3. Updating the Weights of Particles: We evaluate the plausibility of the 
evolved particle by comparing the predicted pixels from the particle to object pixels 


geseg 
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Fig. 4.9 The predicted pixel distribution and their weights. Darker color indicates higher score. 
The green lines are estimated lane marks calculated from the weighed average of N particles. So 
the predicted points which are closer to lane marks have higher scores 





Fig. 4.10 Lane tracking using the particle filtering approach 


from a current observed image. Equation (4.34) can update the weight c; , , and the 
accumulative weight c; , , of the current particle. 
Calculating lane position at time k + 1: The lane position at time k 4- 1 is calculated 
by (4.31). 

Figure 4.10 shows the lane tracking results using the particle filtering approach. 


4.5 Road Recognition Using a Mean Shift algorithm 


In the previous section, we assumed that there exist multiple lane marks on the so- 
called structured roads, such as highways. Hence, we could define parameterized 
lane models and then estimate the parameters of lane marks. However, there are no 
visible lane marks on the unstructured roads, such as county roads [5]. In this case, 
we have to use other visual cues, such as textures, colors. For example, Rapidly 
Adapting Lateral Position Handler (RALPH) system [40] combines the color and 
the texture of a road to better recognize it. In this section, we use Mean Shift (MS) 
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(a) The original road image with (b) The feature space distribution of (a). 
resolution of 256 x 256. 


Fig. 4.11 The road image and its feature space distribution 


algorithms to cluster road pixels and non-road pixels based on color and texture 
features, thus resulting in road recognition. 

Feature space analysis approaches are widely used in low-level visual processing 
tasks [12], where probability density estimation is the most basic algorithm. The 
goal of feature space analysis is to seek significant features, space structures, and 
even subspaces. The denser regions in the feature space could correspond to impor- 
tant features, which leads to data clustering. The feature space analysis based on 
probability density estimation consists of two steps. The first step is to represent the 
feature spaces in some distributions. The most commonly-used feature space rep- 
resentation is the Gaussian Mixture Model (GMM) [7, 42]. However, the number 
of components is a prior in GMM. Hence, for arbitrarily structured feature spaces 
(shown in Fig. 4.11), we have to turn to nonparametric approaches which do not 
make assumptions. The second step is to seek significant features based the previ- 
ous parametric/nonparametric approaches. 


4.5.1 The Basic Mean Shift Algorithm 


Nowadays, mean shift algorithms are widely used in computer vision community as 
a robust feature space analysis approach, for example, in data clustering [10], im- 
age and video segmentation [12, 44], visual tracking [11, 14]. Originally, the mean 
shift algorithm was proposed as a nonparametric data clustering approach based on 
the gradient estimation of probability density functions (pdfs) by Fukunaga et al. 
in 1975 [20]. Later, Cheng further generalized and analyzed this algorithm and its 
properties [10], which now attracts more researchers working on its applications 
again. In computer vision community, Peter Meer and others first applied this al- 
gorithm to various computer vision tasks, image segmentation [12, 13], non-rigid 
object tracking [14]. The basic idea is to repeatedly move the nearby data points 
to their mode. Finally, the iteration procedure will converge to its global optimal 
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solution [10]. In principal, the mean shift algorithm is an iterative multi-start global 
optimization approach [10, 18]. 

Given N features x; € RP, i —0,1,..., N — 1, within a feature space, we esti- 
mate the pdf using a symmetric kernel density estimator: 


N-1 N-1 
prx) =a X Kx, xi, H) =a X K(llx xil. H), (4.37) 
i=0 i=0 


where a = 1/(N4/ H), f K(x)dx = 1. In practice, we simplify the complexity of 
H by taking H = h?I and thus a = 5. 


Nh 
The gradient of (4.37) is 
N-1 
Vex(x)=a1 (x —x)K'(IIx —xill^. H), (4.38) 
i—0 


where o is a normalization factor. 
Now, defining a function G(||x — x; I^, H) 2 — K'(Ix — x;ll?, H), we get 


N-1 _ 2 
vats) =m (sr) CHE jJ (4.39) 


i=0 ~ G(||x — xi e H) 


where G (-) > 0. In the above equation, the first item is pg (x) = xm G (||x — xi |^, 
H). Let us define 
xiG(llx —x;||?, H 
yj = Hp SOE nr m (4.40) 
Rn. G(||x — xil|^, H) 
and thus the second item is mg (x) = y; — x. Actually, y; is the filtered result of x; 
by weighted neighbors within the feature space in mean shift filtering. 
Therefore, we obtain the mean shift vector by 


noc EOS (4.41) 


pax) 


where œ3 is a normalization factor. From (4.41), we can see that (1) the mean shift 
vector is proportional to the normalized density gradient w.r.t. k, (11) the normalized 
weighted mean y; w.r.t. the kernel G is a weighted average within the neighbors of 
xj, (111) the mean shift vector moves always to the maximum gradient direction of 
the density. 

Now we discuss kernel functions in mean shift algorithms. The D-dimensional 
multivariate Gaussian kernel is 


1 1 
Kn(x)= een) (4.42) 
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Another popular kernel is the Epanechnikov Kernel [12] 


l (4.43) 
0 otherwise. 


d+2 — Ilxll2) i 
""- | fO x (1— xl?) if Ixl <1, 
Note that in the mean shift algorithm, the above two are radially symmetric ker- 
nels. Furthermore, we will use a two-dimensional kernel. Assuming an image is a 
two-dimensional lattice of D-dimensional pixels, the multi-variate kernel is repre- 


sented as [12, 13]: 
2 
jel 


where h = (As, h,.) is the kernel bandwidth, C is a constant, x^ and x” are the posi- 
tion part and the range part of a feature vector. 

In the following section, we will introduce different computer vision tasks using 
basic mean shift algorithms. 


S r 


X 
hy 


X 


2 
h. ) (4.44) 


C 
K (x, hs, hr) = aap k( 
S Fr 


























4.5.2 Various Applications of the Mean Shift Algorithm 


Mean Shift Clustering The most basic application of mean shift algorithms is 
data clustering. Given feature vectors x; and their resulting labels z;, we can use the 
following procedure to filter feature vectors: 

e Initialization: y; 1 = xj; 

e Repeat calculating yj k+1 from y; , using kernel G at time k + 1 as follows: 


5o. G(llx — yill, H) 
Finally, we get yj oo, thus yielding cluster centers yj, l = 0, 1, ..., L — 1 until 


coverage. 
e Label features: z; = (xi, yi,oo. Wi}. 


Yi,k+1 = (4.45) 


The Mean Shift Segmentation Similar to mean shift clustering, we assume that 
x; and y; are the feature vectors and the filtered vectors. The goal of image segmen- 
tation is to yield the labels /, / = 0, 1,..., L — 1 of all pixels. The detailed algorithm 
flow is as follows: 


e Extract feature vectors x;, i —0, 1,..., N — 1 of all pixels. 
e Implement mean shift filtering over all pixels x;, and thus generate the clusters 


{fy}, 1[20,L...,L — I. (4.46) 


Assign labels / = {c|Yi œ € yc]. 
Post-processing: remove image regions with less than the predefined number of 
pixels. 


5rjs.cn 000000 





4.5 Road Recognition Using a Mean Shift algorithm 55 


Mean Shift Tracking In principal, given the target position in the previous frame, 
visual tracking is to estimate the target position in the current frame. Let g, represent 
the density function of the target model based on feature x;, and let p, (y) be the pdf 
of a candidate target at position y. Hence, mean shift visual tracking is to seek the 
position y which is the most similar to gy by [14] 


y = argmin,/1 — o (Dx (y), Gx). (4.47) 
y 


The basic algorithm flow of mean shift tracking is as follows: 


e Detect the initial position yo of the target in the first frame and thus compute the 
target distribution (qu), u —0,1,...,m — 1. 

e Initialize the target position at frame k with yo and then calculate the distribution 
{Pu(vo)}, u 20, 1, ..., m — 1, where the total distribution refers to [14]. Hence, 
we evaluate 


p[Po). 5 2? V Pu(S0)4u- (4.48) 


e Calculate the weight of each position x;, i = 0, 1,...,n, — 1 of candidate targets 


^ 


wi = Y a[b) — u] DEOS | (4.49) 





where u is color value, 5(-) is the Kronecker delta function, b(x;) is the histogram 
bin corresponding to the color value of pixel x;. 
e Calculate the new position 


njy—1 ^ 2 
x;w;G — xil^, H 
$1 _ 25 —0 ^i Wi Il yo il ) (4.50) 


xu wi G(||Yo — xi II, H) 


and (fy $1), u — 0, 1, ..., m — 1, and also p[B ($1) 41 = Epo V Pu da 
e While p[p(¥1), d] < eLP (So), q] 

Do $1 — $ (fo + 51) 
e If || jj — yYol| < £, stop; Otherwise, yo = 1, and go to Step 2. 


4.5.3 The Road Recognition Algorithm 


The commonly-used color spaces are RGB, HSV, CIE — XYZ, and CIE — L*u*v* 
which are all applications dependant. The RGB color space is a linear color space 
which is commonly used for display. Similar to [12], we use L*u*v* to represent 
pixel features, since this color space is good for perceiving uniform color spaces. 
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The L*u*v* color space is transformed from the RGB color space by 


L* 116 x ba — 16, 


u* = 13L*(u' — uo), (4.51) 
v* = 13L*(v’ — vg), 


PER 4Z 1 6Y 
where Y/Yo > 0, u = z1isy3az. V = ZrBYpmZz%™d 


A 0.607 0.174 0.2 R 
Y | = | 0.229 0.587 0.114 G |. (4.52) 
X 0 0.066 1.116 B 


Regarding this space, we can use the following equation to measure color dis- 


tance: 
AE =,/(AL*)2 + (Au*)? + (Av*)?. (4.53) 


We summarize the road recognition using mean shift segmentation as follows: 


Down-sampled original images for reducing computational complexity. 

Extract pixel features using L*u*v* color spaces. 

Segment image using mean shift segment algorithms. 

Remove the regions which contain fewer than the predefined number of pixels. 
Recognize road regions using a road reference region; here we use a small image 
region before a vehicle as a road region. 


4.5.4 Experimental Results and Analysis 


We evaluate the road recognition algorithm on CMU’s road dataset,* which are se- 
ries of road images taken from various Navlabs [23]. 

Figure 4.12 shows a road image captured on a sunny day. Figure 4.12(a) is an 
original image with resolution 256 x 240, Figs. 4.12(b) and 4.12(c) correspond to 
the results of using mean shift segmentation when region_max = 10 and 100, re- 
spectively. Figure 4.13 shows feature space distributions after using median filtering 
and mean shift filtering. Similar to Fig. 4.12, Fig. 4.14 presents the results of using 
mean shift segmentation but on a cloudy image. 

From the experiments above, we can see that: (1) the feature space analysis using 
mean shift algorithm is robust to different illumination; (11) the performance of mean 
shift filtering is remarkably better than that of median filtering; (111) post-processing 
can improve the recognition results. 


4http://vasc.ri.cmu.edu/idb/html/road/index. html. 
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Fig. 4.12 Sunny country road images 
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Fig. 4.13 The feature space distributions after using median filtering and mean shift filtering 





(a) The original image with (b) Region Max — 10; (c) Region Max = 100; 
resolution of 256 x 240. the identified available areas the identified available areas 
for traffic. for traffic. 


Fig. 4.14 The lane detection on cloudy country road images 
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Chapter 5 
Vehicle Detection and Tracking 


5.1 Introduction 


Statistics shows that about 60% of the rear-end crash accidents can be avoided if 
the driver has additional warning time. According to the Ministry of Public Safety 
of P.R. China, there were 567,753 reported road traffic accidents in 2004, among 
those about 80% of the severe police-reported traffic accidents were vehicle—vehicle 
crashes. Almost two-fifths of these crashes resulted in an injury, with over 2% of the 
total crashes resulting in a death. Clearly, vehicle detection is an important research 
area of intelligent transportation systems [2, 11, 20]. It is being used in, among 
others, adaptive cruise control (ACC), driver assistance systems, automated visual 
traffic surveillance (AVTS), and self-guided vehicles. However, robust vehicle de- 
tection in real world traffic scenes is challenging. 

Currently, IDASW systems based on radars have a higher cost than those based 
on machine vision, while having narrow field of view and bad lateral resolution. In 
Adaptive Cruise Control (ACC) systems, a camera can detect the cut-in and over- 
taking vehicle from the adjacent lane earlier than a radar. Due to these reasons, it 
is more difficult to apply such radar-based systems into practical IDASW systems. 
Consequently, robust and real time vehicle detection in video attracts more attention 
of scholars all over the world [2, 4, 14]. 

To detect on-road vehicle in time, this chapter introduces a multi-resolution 
hypothesis-validation structure. Inspired by A. Broggio [2], we extract three ROIs: 
a near one, one in the middle, and a far one, from a 640 x 480 image. His approach 
uses fixed regions at the cost of flexibility, we remove this limitation and build a 
simple and efficient hypothesis-validation structure which consists of the three steps 
described below: 


1. ROI determination: We generate ROI candidates using a vanishing point of the 
road in the original image. 

2. Vehicle hypothesis generation for each ROI using horizontal and vertical edge 
detection: We create a multi-resolution vehicle hypothesis based on the preceding 
candidate regions. From the analysis of edge histograms, we generate hypotheses 
for each ROI and combine them into a single list. 
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3. Hypothesis validation using Gabor features and SVM classifiers: We conduct ve- 
hicle validation using the boosted Gabor features of 9 sub-windows and the SVM 
classifiers. According to the judging of the classifiers, we determine whether hy- 
potheses represent a vehicle or a non-vehicle. 


5.2 Related Work 


Hypotheses are generated using some simple features, such as color, horizontal 
and/or vertical edges, symmetry [2, 5], motion, and stereo visual cue. Zehang Sun 
proposed a multi-scale hypothesis method in which the original image was down- 
sampled to 320 x 240, 160 x 120, and 80 x 60. His vehicle hypotheses were gen- 
erated by combining the horizontal and vertical edges of these three levels, and 
this multi-scale method greatly reduced random noise. This approach can generate 
multiple-hypothesis objects, but a near vehicle may prevent a far vehicle from being 
detected. As a result, the method fails to generate the corresponding hypothesis of 
the far vehicle, reducing the vehicle detection rate. 

B. Leibe et al. seated a video-based 3D dynamic scene analysis system from 
a moving vehicle [9] which integrated scene geometry estimation, 2D vehicle and 
pedestrian detection, 3D localization and trajectory estimation. Impressively, this 
paper presented a multi-view/multi-category object detection approach in a real 
world traffic scene. Furthermore, 2D vehicle pedestrians detection is converted into 
3D observation. 

Vehicle symmetry is an important cue in vehicle detection and tracking. Inspired 
by the voting of Hough Transform, Yue Du et al. proposed a vehicle following ap- 
proach by finding the symmetry axis of a vehicle [5]; however, their approach has 
several limitations, such as large computing burden, and it only generates one object 
hypothesis using the best symmetry. Alberto Broggi introduced a multi-resolution 
vehicle detection approach, and proposed dividing the image into three fixed ROIs: 
one near the host car, one far from the host car, and one in the middle [2]. This 
approach overcomes the limit of only being able to detect a single vehicle in the 
predefined region of the image, but it needs to compute the symmetry axis, making 
it not real-time. 

D. Gabor first proposed the 1D Gabor function in 1946 and J.G. Daugman ex- 
tended it to 2D later. In fact, a Gabor filter is a local bandpass filter that can reach the 
theoretical limit for the spatial domain and the frequency domain simultaneously. 
Consequently, Gabor filters have been successfully applied for object representation 
in various computer vision applications, such as texture segmentation and recogni- 
tion [18], face recognition [19], scene recognition, and vehicle detection [14]. 

The basic issue of a Gabor filter is how to select the parameters of a filter that 
responds mainly to an interesting object, such as a vehicle or a pedestrian. Accu- 
rate detection only occurs if the parameters defining Gabor filters are well selected. 
Three main approaches have been proposed in the literature for selecting Gabor 
filters for object representation: manual selection, Gabor filter bank design (includ- 
ing filter design) [18], and a learning approach [13, 14, 16, 19]. In [1], Ilkka Autio 
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proposed an approach for manual selection: An initial set of Gabor filters were ex- 
perimentally selected from a larger set and then manually tuned. In general, a Gabor 
filter bank design defines a small filter pool, and determines the parameters of its 
filters independent of the application domain; moreover, the bandwidth of those Ga- 
bor filter design approaches cannot be determined autonomously. In image browsing 
and retrieval, a strategy is used to ensure that the half-peak magnitude support of the 
filter responses in the frequency domain touch each other by using a filter bank with 
6 directions and 4 scales to compute the features of a texture [12]. Due to indepen- 
dence of the filter bank and the application domain, such an approach can be used 
for object classification, detection and tracking. The main problems of this filter de- 
sign approach are small filter pool sizes, no prior knowledge, and poor performance. 
Learning-based Gabor filter design approaches select the Gabor filters according to 
its application domain. Du-Ming Tsai proposed an optimization algorithm for Gabor 
filters using a simulated annealing approach to obtain the best Gabor filter in texture 
segmentation [16]. A face recognition application using a strong classifier cascaded 
by weak classifiers was proposed by S.Z. Li; in his approach, weak classifiers were 
constructed based on both the magnitude and phase features from Gabor filters [19]. 

In terms of vehicle detection, Alberto Broggi introduced a multi-resolution ve- 
hicle detection approach, and proposed dividing the image into three fixed ROIs 
[2]. His approach allows detecting multiple vehicles in a predefined region. How- 
ever, it uses a symmetry axis for detecting vehicles that is not only time-consuming 
to compute but symmetry features are somewhat problematic. In [14], Zehang Sun 
proposed an Evolutionary Gabor Filter Optimization (EGFO) approach for vehicle 
detection, and used the statistical features of the response of selected Gabor filters to 
classify the test image using a trained SVM classifier. Although good performance 
has been reported, EGFO has large computational cost for the selection of a Gabor 
filter. Moreover, each Gabor filter is optimized for a complete image, but it is applied 
to each sub-window of a test image, which reduces the quality of the representation. 

The requirements of Vehicle Active Safety Systems (VASS) are strict with re- 
spect to the time performance for pedestrian detection and vehicle detection. Ac- 
cordingly, in our approach we detect vehicles only in ROIs, allowing us to make a 
real-time implementation. The ROI approach largely prevents a near car from hiding 
a far car. All the hypotheses are generated in these regions. The positions of vehicles 
are validated by SVM classifiers and Gabor features. 


5.3 Generating Candidate ROIs 


Inspired by A. Broggio [2], we extract three ROIs: a near one, one in the middle, and 
a far one from a 640 x 480 image. But his approach uses fixed regions at the cost of 
flexibility. In our approach, ROIs are extracted using lane markings. In a structured 
lane, we detect the vanishing point using the lane edges. For the consideration of 
real-time processing, we use a simple vanishing point detector rather than a com- 
plex one. Discontinuity and noise related problems can be solved by combining, for 
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(a) Lane edge using single frame. (b) Overlapping of lane edge. 


Fig. 5.1 Edge detection results using single frame and multi-frame 


instance, 10 subsequent images (see Fig. 5.1(a)). Edge detection is done on com- 
bined images consisting of 10 overlapping subsequent images, and the equations of 
two lanes are deduced from a voting procedures like HT by analyzing horizontal 
and vertical edges. Four random points Pgj, d = L or r; i —0,...,3, are selected 
on each lane line, and each tangent direction of two points (shown in (5.1)) between 
the closest 3 points 


(Pai, Paji d=rorl; Lj€(0,L2,35 i<j; |j—-ilz2, (5.1) 
is obtained by 
——— 
Ogi; = Pai Paj- 


The tangent directions of two lane lines are calculated using the average value of 
the above tangent angles and are described by 


5 = A401 + 0402 + az + 6413 + 6423 M er (5.2) 
Combining the average coordinates of 4 interesting points Py = n 3 m Pqi with 
the average tangent angles 04, we can get the equations of two lane lines. The in- 
tersection point of the two lines is an approximation of the vanishing point; see 
Fig. 5.2. Next we consider how to extract ROIs from the original image. For the 
consideration of vehicle height and the camera parameters, the top boundaries of all 
the ROIs are 10 pixels higher than the vertical coordinates of the vanishing point. 
From the analysis of the camera parameters and image resolution, the heights of 
the near, middle, and far ROIs are 160, 60, and 30 pixels, respectively. The left and 
right boundaries of the near ROI are those of the image. The distance between the 
left boundary of the middle ROI and that of the image is just one-third of the dis- 
tance between the vanishing point and the left boundary of the image, and the right 
one of middle ROI is determined similarly. The distance between the left boundary 
of the far ROI and that of image is two-thirds of the distance between the vanish- 
ing point and the left boundary of image, as well as the distance between the right 
boundary of the far ROI and that of the image. Figure 5.2(b) shows the results of 
each ROI. 
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(a) Linking lines between interesting (b) Division of ROI. 
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(c) Lane ROI generation. 


Fig. 5.2 Vanishing point and ROI generation 


5.4 Multi-resolution Vehicle Hypothesis 


For traditional approaches, the edges of small objects cover those of a large one; 
Fig. 5.3(b) is the result of a global histogram of horizontal and vertical edges and 
shows the edge histogram without a peak for the small vehicle. Based on the pre- 
ceding candidate regions, the histogram of a ROI shows a peak for a small object, 
shown in Fig. 5.4. The analysis of the peaks and valleys of an edge histogram re- 
sults in several rectangles, and each one represents a vehicle hypothesis. We use 
prior knowledge to eliminate some hypotheses. The minimum width of a vehicle 
can be set for each ROI. If the width of a hypothesis is smaller than this width, the 
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(a) Original image. (b) Edge histogram. 


Fig. 5.3 Global statistical histogram of horizontal and vertical edges 
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(a) Near layers image. (b) Far layer image. 


Fig. 5.4 Histogram of horizontal and vertical edges for the near and far ROIs 





(a) Hypothesis generation result unus- (b) Hypothesis generation result using 
ing ROI. ROI. 


Fig. 5.5 Comparison of hypothesis generation results 


hypothesis will be eliminated. Additionally, the aspect ratio (width/height) of vehi- 
cles 1s in a certain range; we assume this range to be [0.67, 2.2]. Rectangles with 
other ratios are eliminated. Since the histogram is made by extracting edges from 
the ROIs and other objects, like power cables and traffic signs above the road which 
are not in the ROIs, they do not disturb the edge histogram, reducing the false pos- 
itive rate (see Fig. 5.5). The coordinates of all hypothesis objects will be translated 
into the coordinates of the original image, and then the hypotheses of different ROIs 
may be overlapping. According to the distance between two rectangles, d(ri, r2), 
we can judge if the two rectangles ought to be incorporated into one. Equation (5.3) 
defines the distance between rectangles rı and r2, and here (xij, yij) are the coor- 
dinates of the jth vertex of the ith rectangle, i = 0,1; 7 = 0, 1,2,3. Through the 
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above process, we finish the generation of vehicle hypotheses: 


d(ro, r1) = liro — rile, 
| (5.3) 
rj = (Xi0, Vi0» Xi1. Yi» Xi2: Yi2. Xia. Vi3), i =O, 1. 


In conclusion, in our multi-resolution hypothesis generation approach, the ROIs 
complement each other; moreover, appropriate constraints improve the search effi- 
ciency, which greatly reduces the computing burden of hypothesis generation. Note 
that the heuristic multi-resolution works well in real-time though an Efficient Sub- 
windows Search (ESS) which was proposed to localize objects using branch-and- 
bound optimizing algorithms [8]. 


5.5 Vehicle Validation using Gabor Features and SVM 


After vehicle hypothesis generation, we are ready to validate the hypotheses. The 
preprocessing of the original image includes image scaling to 32 x 32, smoothing, 
histogram equalization, and image division. Afterwards, the vehicle can be repre- 
sented with Gabor features, and the feature vector of a vehicle is input for the SVM 
classifier. According to the judging of the classifier, we determine that the hypothe- 
sis represents a vehicle or a non-vehicle. 


5.5.1 Vehicle Representation 


We first introduce some necessary definitions for Gabor filters and basic concepts 
for vehicle representation. The 2D Gabor function can be defined as follows: 


1U v | 
exp) -3(S + 5) exp[27 j f U |], (5.4) 


u v 





G u, v) = 
(F, 9, ou,ou} ) 270,0, 


where 


U = (u, v)(cos o, sing), 
V = (—u, v)(sin g, cos o). 


Here f means the normalized spatial frequency of a complex sinusoidal signal mod- 
ulating Gaussian function, g is the direction of a Gabor filter, o;, and o; are the scale 
parameters of the filter. Therefore, { f, 9, Ou, Cy} can represent the parameters of a 
Gabor filter. Actually, a Gabor filter is a bandpass filter, and the first step of vehicle 
detection is to select the Gabor filters strongly responding to the detected object. 

Gabor features can be obtained by convolving the input image 7 (u, v)((u, v) € 
$2, where £2 is the image pixel set) and a 2D Gabor filter g(u, v) as 


RUE | J re n)e(u — £, v —n)dé dn. (5.5) 
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Table 5.1 Selection of the optimized Gabor features 


(1) Give the test error rate for the mth sub-window by (xi, yj y*. o; Where x; is the parameter vector, 
yi is the error rate; Yo = (yo, y1, ..., yn}. Po = {}; 
(ii) Select the optimized filters 
For t = 0,1, 2,3 
Here: index = argmin|| Y;|loo 
Y; = Y; — {Yindex} 
if ||Xindex — xj l| > €, xj € P 
then P, = P; + {Xmax, } 
else 
goto Here 
(ii) Get the best Gabor filter bank for the mth sub-window. 


Although a linear feature could be directly used to represent (5.21), few scholars 
use it. The general Gabor features include thresholded Gabor feature, Gabor-energy 
feature, Complex moment Gabor feature, and grating cell operator feature. In our 
approach, we adopt the complex moment features of a Gabor filter response as the 
feature vector of our classifier. 

We select the filter parameters with the strongest response for a certain sub- 
window including a vehicle part, and use SVM as a performance estimation clas- 
sifier. The test image is divided into 9 overlapping sub-windows, and the statistical 
Gabor features from the convolution between sub-window image patch and a Ga- 
bor filter, mean jz, standard deviation 0, and the skewness x represent the vehicle 
[14]. We optimize the SVM parameters for each of the 9 sub-windows, and test the 
resulting 9 classifiers for each sub-window using test examples. Then we record the 
average error rate. At last, for each sub-window, the 4 Gabor filters with the mini- 
mum average error rate are combined into a filter bank for extracting a feature vector 
(see Table 5.1). 

Then the 9 sub-windows with 4 Gabor filters each make a feature vector of size 
108, 


[A411. O11, Kit, M12, 012, K12,---, 493, 993, K93, U94, 094, Koa], 


where Mij, 0;j, Kij, are the mean, standard deviation, and skewness, respectively; 
i is the number of a sub-window, j is the number of a filter. 


5.5.2 SVM Classifier 


SVM is an efficient approach to find the optimal hyperplane in a binary classifica- 
tion [3, 6, 17]. Here, the hyperplane has the maximum margin between two distin- 
guished classes, which ensures not only the minimum empirical risk, but also the 
minimum Vapnik—Chervonenkis (VC) confidence. 

Let {x;, ji; i =0,1,...,L—1, y; e{-1,1}, x; € R?, be training samples. 
Assume that the hyperplane separates the positive samples from the negative ones. 
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Fig. 5.6 An illustration of a 
max margin classifier and 





support vector . Support 
vector 
= 
ps 
Then, the point x in the hyperplane satisfies 
wx +b=0, (5.6) 
where w is the normal to the hyperplane, and ter is the distance from the origin 


to the hyperplane. For refine the margin of a separating hyperplane to be the short- 
est distance between the closest positive and negative sample, respectively, and the 
hyperplane. For the linearly separable case, the support vector machine seeks the 
hyperplane with the largest margin. Therefore, all training data should satisfy the 
following constraints 


y Q;w-Tb)—-12z0, Vi. (5.7) 


Now we consider two different types of points. For the points on the hyperplane H1, 
we have 


xjwtb=1. (5.8) 


Similarly, the points on the hyperplane H satisfy the following equation 
xjwtb=-l. (5.9) 
The margin between Hı and A) is 


IL—^5l Ibl 
AE = 2H wy. (5.10) 
|| w || || w || 


II+ Ilb 
|| w || | w || 








margin — + 

Thus we can obtain the optimal hyperplanes by minimizing ||w ||, resulting in the 
maximum margin classifiers. Moreover, define those training points on the hyper- 
planes H4 and H» to be the support vectors, where the hyperplane H» is determined 
as shown in Fig. 5.6 using the extra circles. 
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Now we introduce unconstrained Lagrangian multipliers of the problem: 
1 L-1 L— 
Lp - gel - 2, aiio ED uas (5.11) 
i= 


Minimizing Lp w.r.t. w, b, xj, we have 


W — ) ai yixi, (5.12) 
3 oj yi — 0. 
Substituting (5.12) into (5.11), we have 
| Exteel 
bo= Dai 1X; Yan YjXiXj. (5.13) 
i=0 j=0 


Note that Lp in (5.11) and Lp in (5.13) have the same objective function but 
with different constraints [3]. For the linearly separable case, we can obtain support 
vectors from (5.13) by maximizing Lp w.r.t. xj. In the solution, the points with 
x; > 0 are the support vectors which determine the hyperplane. Finally, we obtain 
w and b from (5.12) and (5.6). 

We will discuss the non-separable SVM. To make the method flexible, the inner 
products in (5.13) can be substituted by a kernel function K [x;, x;]. Thus, we have 


pe 


io-YXs- 13 Y o ; yi yj K [xi , xj]. (5.14) 


i=0 j=0 


Kernel choice: There are many kernels investigated for computer vision and pat- 
tern recognition and they are as follows: 
Polynomial Kernel: 





K [xi.x5] = (xix s + 1)’. (5.15) 
RBF Kernel: 
deis? 
K[xi,xj] =e 2° , (5.16) 
Sigmoid Kernel: 
K [x;, x ;] = tanh(kx; y; — ô). (5.17) 


We use Radial Basis Functions as Kernel Functions. 
If the training examples from two classes cause the two classes' margin to be 
maximal, then the classification hyperplane satisfies the following constrains: 


L 
fœ) 9» yiaik[x, xi] +b, (5.18) 


je] 
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rT 


LE E | 





Fig. 5.7 The XJT AI&R vehicle examples database (the left two images) and the false examples 
of our detector (the right two images) 


where x, x; € R are N-dimensional input feature vectors, L is the number of la- 
beled examples, y; € (—1, 1) is the ith labeled example, and k[x, xj] is the inner 
product function. We use the radial basis function as a kernel function. For training 
the classifier, we selected 500 images from our vehicle database which was collected 
in Xi' an in 2005. They contain 1020 positive examples and 1020 negative examples. 
When testing the classifier, we get above 90% average right detection rate using 500 
negative and positive examples independent of the training examples, and the miss- 
ing and error detection rate is below 10%. Figure 5.7 shows the database, some false 
positive examples, and some false negative examples; Fig. 5.9 describes the ROC 
curve of the classifier. 


5.6 Boosted Gabor Features 


To reduce the computational burden and improve the performance in vehicle detec- 
tion, we propose a supervised learning approach based on boosted Gabor features. 
A similar attempt to select the Gabor features is described in [13]. However, the 
choosen Gabor feature set in that study is larger than those in our study; moreover, 
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the SVM is only used to classify objects during the period of recognizing step rather 
than the previous training step. Their approach may result in performance deterio- 


ration. In contrast, we use SVM as a classifier during the period of both the training 
step and classifying step. 


5.6.1 Boosted Gabor Features Using AdaBoost 


5.6.1.1 Gabor Feature 


We first introduce some necessary definitions for Gabor filters and basic concepts 
for vehicle representation. The 2D Gabor function can be defined as follows: 





1 - Sa a N^ 
G p(u, v) = e ^c os ag TIU (5.19) 
Am o,0y 


where 


le = (u, v)(cos q, SING), (5.20) 


V = (—u, v)(sing, cos o), 


and f is the radius frequency of a complex sinusoidal signal modulating Gaussian 
function, g is the direction of a Gabor filter, o;, and o, are the scale parameters of 
the filter, and p = ( if , 9, Ou, Oy) € R4 represents all the parameters of a Gabor filter. 
Clearly, for image pixel set £2, Gabor features can be obtained by convolving the 
input image J(u, v) ((u, v) € 2) and a 2D Gabor filter g(u, v) as 


Ro | J 1(, n)e(u — £, v —n)dé dn. (5.21) 


Although a linear feature could be directly used to represent an object, few schol- 
ars do that. The most often used Gabor features are thresholded Gabor features, 
Gabor-energy features, Complex moment Gabor features, and grating cell operator 
features. In our approach, we adopt the complex moment features of a Gabor filter 
response as the feature vector of our classifier. 


5.6.1.2 Boosted Gabor Features 


The selection of different Gabor features has some effect on detection performance; 
however, the primary reason of selecting a Gabor filter is to find the Gabor filters 
strongly responding to the object of interest. The filter parameters are adjusted to ob- 
tain the strongest response for sub-windows comprising a vehicle part. The image is 
divided into 9 overlapping sub-windows, and the vehicle is represented with statisti- 
cal features, mean u, standard deviation 0, and the skewness x, from a convolution 
between a sub-window and a Gabor filter [15]. 
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Table 5.2 BGF algorithm description 


Input: 

Training examples 1 (;i; yi), 1T Ei €n; Training examples 2 Uj, yj), n - 1 Ej nom; 
Gabor filters: c1,...,cw; yi is the ith label of an example 

Computation: 


For each sub-window 5 
For each Gabor filter c 
For each training example (7;, yj) 
Fb. o 8yH( G18) ey i=1,...,n 
Train the SVM classifier using the features 
For each training example (/;, y;) 
r(Ij,c; s) —- (Hj 0s) ce, jo=n+l,...,.n+m 
Classify the training examples 2 
Do T times 
Obtain one feature using AdaBoost algorithm; 
Output: 


The 4 Gabor filters after 4 iterations with weights œ;, ; for each sub-window. 


Having obtained Gabor features of an object, it is time to evaluate the perfor- 
mance of a Gabor feature. Boosting approaches aim at improving the accuracy of 
any given learning algorithm and focusing on “difficult” examples. The AdaBoost 
algorithm proposed by E. Schapire is one of the most popular variations of basic 
boosting algorithms [7]. In its original form, it is used to improve the accuracy of 
any given learning algorithm. In our approach, it is used to boost the Gabor features 
for vehicle detection. 

There are many Gabor features associated with a sub-window; however, few Ga- 
bor features are crucial for vehicle detection. Consequently, feature selection must 
be performed on these Gabor features. In our approach, we optimize the SVM classi- 
fier parameters for each of the 9 sub-windows, and then test the resulting 9 classifiers 
for each sub-window using test examples recording the classification rate of each 
Gabor filter for the 9 different sub-windows. According to the results, we perform 
the boosting task on a larger set of Gabor features using the AdaBoost algorithm, 
where each round of boosting finds one Gabor feature for a sub-window from the 
candidate features. After T iterations, it yields T Gabor filters for each sub-window 
(see Table 5.2). In our experiments, a total of 36 filters were combined into a feature 
vector to represent vehicles. The detection performance of BGF approach using the 
features after 4 iterations is better than those after 6 iterations. 

The selection of a Gabor filter is to find the optimal parameter set in Gabor pa- 
rameter space R4 


Ups s Dir-++> PN}; 


where p; = ( f. Qj , Oui, Ovi), and N is the number of Gabor filters. For convenience 
of computing Gabor parameters are discretized. We define the range of f to be 
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[ Jm pw According to the Nyquist theorem, the digital frequency wm = m cor- 
responds to the maximum frequency of a band limited signal, and the frequencies 
higher than z will be distorted. We write @max = 27 fmax/ fs = 27 ns and then 
LM = Omax/27t, Where f is the general frequency, f; is the sampling frequency, 
and fk is the kth normalized frequency that can be discretized by 





fy = [m zi fmax — fmin , vikas (im) 


L min 


where L is the number of sample points and a is the sample scale. For the direction 
y of a filter, the filter response to an object in [0, 2] is the same as to an object in 
[zr, 27]. 

The sample interval for uniform sampling is Ag = 180/P degrees, where P is 
the number of samples for o. The scale parameters c, and o, are actually the ef- 
fective size of a Gaussian function, and their ranges are equal, say [Omin, Cmax]. The 
upper limit Omax = W;/5, where W, is the sub-window width, resulting in 98.7% 
energy in the range [—z, 2] of g. At the same time, the lower limit Omin equals 0, 
and the number of samples for both o; and oy is M. 

In the experiments of our approach, a = 1.5588, L = 15, P = 15, M = 10, 
N = 22500, W, = 40, f € Do fax] = = [0, 0.5], and ox, oy € = i^ 8]. Figure 5.8 
shows our boosted Gabor filters for vehicle detection. 


5.6.2 Experimental Results and Analysis 


5.6.2.1 Vehicle Database for Detection and Tracking 


For vehicle detection and tracking, we collected the vehicle video database under 
two conditions: general and hard ones. According to weather, daytime, road type, 
and congestion, we collected images of several kinds of vehicles, such as sedans, 
trucks, and motorcycles. The host vehicle collecting the video operated at 3 dif- 
ferent speeds: 40, 80, and 120 km/h. Table 5.3 is the summary of various road 
conditions for video collection, and here x indicates hard conditions; ,/ indicates 
general conditions. 


5.6.2.2 Boosted Gabor Features 


We have carried out vehicle detection using, apart from our approach, the EGFO 
approach and a no-boosting approach. The distribution of Gabor filter parameters 
( f , 9, Ox, Oy) is shown in Fig. 5.10. We can see that our boosted Gabor filters are 
different from the optimized Gabor filters using EGFO because each Gabor filter in 
our approach is optimized for a sub-window rather than for a complete image. For 
the frequency of Gabor filters, the boosted filters tend to have a low frequency due 
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Fig. 5.8 Boosted Gabor 
filters using AdaBoost 
algorithm: each row shows 
the boosted Gabor filters for 
one sub-window, and the ith 
column represents the Gabor 
filter after the ith iteration 


Table 5.3 The collection 
conditions of vehicle video 





Conditions 


No traffic congestion 
Traffic congestion 
sunny 

Cloudy 

Rain 

After rain 

Against sun 

Night 


Curve 


v 
V 
v 


Straight 


* 


Sec oe Se 


Upslope 


* 


See 


to large structures in vehicles, like windows and bumpers. The directions of most of 
the boosted Gabor filters are close to 0°, 45°, 90°, and 135? (see the second sub- 
figure of Fig. 5.10), due to the prevalence of these angles in vehicles. In addition, 
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Fig. 5.9 ROC curves for our Vehicle Detection results: ROC 
vehicle classifier 
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Fig. 5.10 The distribution of the parameters of Gabor Filters 


from the latter two sub-figures of Fig. 5.10, it follows that o, is larger than o; , in ac- 
cordance with vehicles being wider rather than higher. To summarize, the choice of 
Gabor filter parameters is heavily dependent on the detection object, a Gabor filter 
that works well for vehicles probably does not work for pedestrians, and vice versa. 
Our Gabor filter selection optimized for vehicle detection results in better perfor- 
mance than those based on previous selection methods (Fig. 5.11). For training the 
classifier, we selected 500 images from our vehicle database which was collected in 
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Fig. 5.11 Vehicle detection 
based on hypothesis 
validation 





Xi'an in 2005. They contain 1020 positive examples and 1020 negative examples. 
In testing the classifier, we use 500 negative and positive examples independent of 
the training examples. 

For the validation of the performance of our BGF approach, the vehicle detection 
experiments were performed on three different optimization approaches for 22,500 
Gabor filters. The experimental results show the Average Right Rate (ARR) of a 
no-boosting Gabor feature approach to be 90%, that of BGF approach to be 96%, 
and that of EGFO approach to be 91%. Figure 5.12(a) is the comparison of our 
detector with the other two approaches, and the Receiver Operating Characteristics 
(ROC) curves that compare different boosting approaches are shown in Fig. 5.9. 
These figures show that our vehicle detector has good discrimination ability with a 
low decision bias when comparing the no-boosting and EGFO algorithms. 


5.6.2.3 Vehicle Detection Results and Discussions 


We tested our vehicle detector on the collected video using the Springrobot platform 
[10]. Figures 5.13 and 5.14 show the results of our vehicle detector under general 
and hard conditions, respectively. We proposed an approach for vehicle classifica- 
tion and detection with good time performance using vanishing points and ROIs 
and achieving high detection accuracy using Gabor features. The method using the 
vanishing point to define ROIs eliminates the disturbing effects of some non-vehicle 
objects, improving both the detection rate and the robustness of this approach. The 
detection speed of our vehicle detector is approximately 20 frames/second on a 
Pentium®) 4 CPU 2.4 GHz both for the general and hard conditions. The detec- 
tion rate is defined by 

e Miseni nf) (5.22) 

à j=1 (vj) 
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Fig. 5.12 Two kinds of Vehicle Detection results: ROC 
curves comparing different 
Gabor filter optimization 
approaches 
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Vehicle Detection results: Recall-Precision 
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(b) Recall-precision curves 


where nt; represents the number of right detection in the ith frame; nf; represents 
the error detection rate in the ith frame; nv; represents the actual number of vehicles 
in the jth frame. With this definition, our vehicle detection rate is above 90%. 

In our approach, we have introduced a structure of hypothesis and validation for 
vehicle classification and detection. The experimental results of our system so far 
show that the algorithm works well on a structured road. Extension of this approach 
to unstructured roads needs to be investigated. Additionally, under the conditions of 
congestion, the constraints are too strong to detect all vehicles but the unobstructed 
vehicles are all detected. Further research work will focus on these problems. 
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Fig. 5.14 Vehicle detection results under the hard conditions 
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Chapter 6 
Multiple-Sensor Based Multiple-Object 
Tracking 


6.1 Introduction 


As we mentioned in previous chapters, vision sensors are capable of estimating 
the relative position between the host vehicle and other vehicles, determining the 
shapes of obstacles and lanes. However, vision sensors could depend on weather and 
lighting conditions. Moreover, using single vision sensors it is difficult to estimate 
the longitudinal distance since perspective projections remove depth information. 
A radar/lidar based system is robust to weather and lighting conditions, and it is 
also easy to estimate depth information. In conclusion, radar/lidar and vision sensors 
have complementary properties. The systems combining these sensors remarkably 
improve overall system performance. 


6.2 Related Work 


Multi-sensor multi-object detection and tracking systems have received consider- 
able attention over the last 5 years [1, 5, 6, 9-11, 14, 15, 17, 18]. In [17], a strategy 
that distinguishes between a static object and a moving object by estimating ob- 
ject speed has been proposed, where both the speed and the direction of the objects 
and the host vehicle are used to estimate the speed. In [9], three different geometric 
object models are designed for small objects, the objects described by a rectangular 
shape like that of a car, and free-form objects, respectively. In terms of obstacle clas- 
sification and tracking, the most generally used combination approach consists of a 
camera and a range sensor [1, 9, 11, 17]. An approach that simplifies the fusion be- 
tween range and vision sensors using corresponding sets of hypothesis was proposed 
in [1]. In this system, a radar device and a monocular camera are fused by sharing 
sets of hypotheses for the detection of vehicles. In [5], a decentralized multiple- 
sensor multiple-target tracking approach for the Autotaxi system is considered for 
avoiding collisions, where the tracking involves three stages: data alignment, track- 
to-track association, and track fusion. A sensor fusion strategy that introduces depth 
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cue into the segmentation algorithm improves the target segmentation performance 
due to the complement of radar and vision [6]. As a pre-crash system, SAVE-U 
project aims at protecting pedestrians and bicyclists and avoiding collisions between 
pedestrians and vehicles, where the sensor platform consists of radar sensors, nor- 
mal cameras, and infrared cameras. Alternatively, another combination form, such 
as using an infrared camera and a radar [14], has good performance in driver assis- 
tance systems. 

In environment perception, CHAUFFEUR Assistant system combined both radar 
and video sensors, providing vehicle controllers with valuable data about preceding 
vehicles and about the lane. 

In Highway Lane Change Assistant (HLCA), vision and radar sensors are com- 
bined to detect dangerous objects in the neighboring lanes [16], which was evaluated 
by different drivers in different vehicles. On German highways, the common vision- 
based lane recognition system is proved to be affected by weather, a fusion approach 
was used to estimated road structures and the positions of the other vehicles in front 
by combining vision and radar sensors [7]. Combining radar-based ACC and vi- 
sional perception, a Hybrid Adaptive Cruise Control (HACC) was created to first 
detect and track lanes and vehicles. And afterwards this information was used in the 
longitudinal controllers [8]. 


6.3 Obstacles Stationary or Moving Judgement Using Lidar 
Data 


A lidar sensor is often used as an on-board sensor for driver assistance systems. 
Much effort about clustering the original data and classifying the objects using a 
lidar sensor have been made [13, 17]. A method consisting of three modules: scan 
segmentation, object classification, and object tracking by a lidar is used to detect 
and track multiple objects [13]. A strategy is proposed for distinguishing all objects 
detected by a lidar and for dividing them into three categories: moving objects, 
roadside reflectors, and overhead sign [17], where the motion of detected objects 
is judged by the relationship between the path of the host vehicle and changes in 
the positions of the objects. The position and size of obstacles are not sufficient to 
assess its safety in IIDASW systems, and the various behaviors of all the obstacles 
on the road should also be considered, such as the velocity and the acceleration. We 
have developed an algorithm to estimate the velocity of all obstacles. 

First, we segment the lidar data into several clusters, and each cluster repre- 
sents one target. According to the distance between two laser points, we can judge 
whether two points belong to an object or not by the following equation [13] 


2 tan B sin($) 


ESCORT 6.1 
cos($) — sin($) tan f eM 


lk k--1 € fmin 


where rg is the distance of the kth point to the laser device, rk k+1 = |rk — rk+1l, 
‘min = min(rg, rk+1}, 9 is the angular resolution. In our experiments, $ = 0.25? and 
p-—85*. 
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(a) Host vehicle velocity (b) Moving object velocity computation. 
computation. 


Fig. 6.1 Our velocity estimation approach 


In many similar systems, the vehicle speed is measured by the encoder [2]. In 
contrast, we proposed a vehicle speed estimation algorithm by using a static object 
given the two observation values (74, 01) and (r5, 02) as follows: 


vn = |r? 4 r$ — 2rir2cos(61 — 62) |/ (n T), (6.2) 


where T 1s the sampling interval; m is the number of the consecutive frames, and it 
is generally larger than 1 for improving the velocity accuracy. Here we assume that 
over a small interval of time mT the driving direction of the host vehicle is consistent 
with the Y -axis in the Cartesian coordinates system X OY as shown in Fig. 6.1. After 
finishing the velocity estimation of a host vehicle, we can obtain the coordinates of 
two segments: Po = (xo, yo) and P, = (x4, y1). Therefore, 


xo = —r] cos 0}, yo = r1 Sin 4, (6.3) 
x1 = —r2 cos 6, yy =x2MT vp + r2 sin 02. (6.4) 


Then we can estimate the absolute velocity of stable objects in the scene by the 
following equation 


MP EV. 
s y (y1 — yo)* + (x1 — xo) | (6.5) 
mT 
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Fig. 6.2 Lidar clustering and 60 
speed estimation 
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Considering the noise and vibration of the lidar, we can judge whether an object is 
moving or is stationary by the Mahalanobis distance given two segments Po and P1: 


d = (Py — P)! E (Bg — Py), 


where X is a covariance matrix reflecting the uncertainty characteristics of lidar 
data. If d < do, the object is stationary, otherwise the object is moving. Here the 
decision rule can be interpreted geometrically as saying that the distance between 
the two points is less than do, taking into account the variance. Figure 6.2 shows the 
results of velocity estimation using our algorithm, where the 13th, 14th and 15th 
objects are stable for several consecutive frames. 


6.4 Multi-obstacle Tracking and Situation Assessment 


6.4.1 Multi-obstacle Tracking Based on EKF Using a Single 
Sensor 


6.4.1.1 Probability Framework of Tracking 


From the viewpoint of probability, tracking is a kind of statistical inference, in other 
words, given the observation values at time 1 and extending up to and including time 
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k: Zyk = {Z1, Z2, ..., Ze}, we may construct the posterior probability P(X;|Z1-x), 
and then obtain the estimate X y and the covariance matrix P of the state vector X; 
at time K. 

For the sake of simplicity, we make two assumptions: 


1. The state at the current time k only depends on the state at the last time k — 1, 
which is called a first order Markov Process. Consequently, it yields the follow- 
ing equation 


P(X AA E 45 X42; SPO XE): 


2. The observation at current time k depends only on the current state P(Zj.4|X4) = 
P(Zx|Xx) PCZ ya -10 Xx). 


By hypothesis, we can deduce the Bayesian posterior probability 


POGIZuo = LEES PURA TR (6.6) 
P(Zixk|Zik-1) 
where P(X x|Z 1.) is the posterior probability, P(Z;,|X;) is the likelihood, P(X;| 
Z1:k—1) Is the prior probability, and P (Z1:k|Z1:k-1) = f P(Zk| X) P(Xk|Zi:k-1)dXk 
is the belief. 
For the probability framework of a tracking problem, we may proceed in the 
manner described next: 


3. Prediction Step. Given P(Xķ—-1|Z1ı:k—-1), we can obtain P(X4|Zij.4—1) and 


Xk|k—1- 
4. Update Step. Given P(X4. 1|Zi.—1) and Zz, we can obtain P(X4|Zi.4) and 
Xk|k- 


6.4.1.2 System Model 


In this system, we adopt the constant acceleration model to build the system equa- 
tion 


ATST Npa G. 
| k kar V, (6.7) 


Ze — h(Xy) + w. 
The state vector at time k defined as 
Xy = [Xi Xi, Xi Yk, Yi Fel , 


where xx, X4, xy are the position, velocity, and acceleration in the x-direction at time 
k; Yk, Yk, Yk are the position, velocity, and acceleration in the y-direction at time k. 
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The state transition matrix can be written as 


1 T TPR 00 0 
01 T 00 0 
00 1 00 0 
“=lo 0 o 17 TAPA” s 
00 0 0]! T 
00 0 00 1 


where T is the sampling interval of a sensor. In (6.7), v = [vg,, Vay |" is process 
noise, modeled as a zero-mean white noise whose correlation matrix is defined by 


2 0 
{ut} - l8 MT 


where n is the dimension of v; and here n = 2. The process noise distribution matrix 
corresponding to the above is 


T*/2 0 

T 0 

1 0 
G = 0 T? /2 (6.9) 

0 T 

0 1 

We define the observation value at time k as 
Z= H , (6.10) 
OK 


and its observation function is 


how =| yxk XE | (6.11) 


tg M yk/xk] 


Therefore, we can get the following form 


ls = riy COS(Ox), (6.12) 


yk = ry Sin(O). 


In this system, the states of objects are in the Cartesian coordinate system, while 
observation values are in the polar coordinate system. Consequently, the observation 
equation is nonlinear. We may now linearize h(X) around X = X k|k—1 and obtain 
the observation matrix 


Xen 0 Îkhka 
H= oh Vt +See 1 ven +See 1 (6.13) 
€ = X 
oX 3$ 4 o Da o Q Ue 0 0 
Siik- rg l i EZ l 
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In (6.7), w = [w,, we] is the observation noise, modeled as a zero-mean white noise 
whose correlation matrix is defined by 


2 
T O 0 
where n is the dimension of w; and here n = 2. 
Combining the probability framework of tracking with the Minimum Mean 
Square Error (MMSE), we can obtain the EKF's prediction equation 


b = FXy-1k-1, (6.14) 


Py = F Pe-1e-1 Ff + GQG’, 


where X k|k—1 1S the state prediction at time k given the state at time k — 1 and 
P(k|k — 1) is the prediction covariance. 
The update equation given zę and X;\,—1 at time k can be written in the form 


Š kik — Xia + Wl Zk — Hy X ya], (6.15) 
S = H Pug- HT +R, 
k|k—1 : E (6.16) 
Wi = Pii H ST 5 


where S is the observation prediction covariance, and Wọ is the Kalman gain, X k\k 
is the output of state update, and Px), is the update state covariance. 


6.4.1.3 Initial Conditions 


Concerning the initialization of the EKF, we determine the local tracks by using the 
acceleration of three points where it is assumed that motion of an object is modeled 
as having constant acceleration, finishing the initialization operation; for details we 
refer to [15]. 


6.4.1.4 Data Association for a Single Sensor 


For lidar and radar data, data association is the first of all steps when new data ar- 
rive, aiming at judging the corresponding relation between the current observation 
and the previous track. Our data association includes two categories: observation-to- 
observation and observation-to-track. The main objective of the association between 
observations is to initialize tracks correctly, while the association between an obser- 
vation and a track aims at holding and updating the existing tracks. 
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Fig. 6.3 Various association gates: circle, sector, and ellipse 


1. Observation-to-Observation Association When a new object appears, we can 
hold the observation directly. For a single point, we do not know the moving direc- 
tion of the object. In that case, when the object has an observation z1, we use the 
circle association gate to judge the correlation between zı and z5*" without a mov- 
ing direction (see Fig. 6.3). 

To associate a new observation, the next problem is to compute the radius of the 
association gate r. The radius of an associate gate is defined as 


Armax = (Vh — Vo) : Ts, (6.17) 


where v; is the velocity of the host vehicle, v; is the velocity of the obstacle. If 
Iz5^^ — z1i| € Armax, it means that z5*" is correlated with z;, otherwise they do not 
correlate (see Fig. 6.3). 

If there are two existing observation points of a certain object, zı and z2, we 
can use a sector association gate to judge the correlation between z5*" and z2 (see 


Fig. 6.3). If the following inequality is satisfied 


rs < |2173 | < r7, 


— (6.18) 
| arg(z1Z3” ) — arg(z1z2)] < 9, 
then z5^" is located inside the association gate, which represents the correlation 


between z3° and z2. Here 0 is a threshold value. 


2. Observation-to-Track Association During the period of tracking, we obtain 
the state update value of a track X,—1)x—1 at time k — 1 and the state prediction value 


of a track X k|k-1 at time k. Combining observation value z;*^" at time k with the 


previous two state values judges whether z;*" is associated with X;\x—1 or not. 


After the initialization of tracks, it yields the state estimates of objects by using a 
prediction-and-update model. In general, the longer exiting period results in lesser 
estimation covariance. In our approach, we set an ellipse association gate with its 
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center X k|k—1 (see Fig. 6.3), and choose the motion direction of an object as a major 
axis. 
Define the new observation as 


p - pr gp" | 


Consequently, we obtain the Cartesian coordinates of the observation given the 
observation value z;*" in the form 


new „new new 
Xy =r; cos(0," ), 


yp" — p sin(07**). 


(6.19) 


Then we can obtain the state prediction value 


â ; - : - T 
Xkjk—1 = [Xkik—1; Xk|k—1» Xklk—1». Yklk—1» Yklk-1» Yk|k—1] 


and the motion direction of the object 


Vilk—1 
= aretan ( 2441). 
Xk|k—1 


Here 6, is the rotation angle of the ellipse association gate. 
On the basis of the previous results, we get the ellipse equation of the association 
gate 


Io 
Xe Ye 

£ =], 6.20 
2 tg (6.20) 


where 


Xe = (X — Xkjk—1) COS Oo + (y — yxjk—1) Sin 66, 


(6.21) 
ye = (x — xy—1) (— sin 00) + Cy — ykik-1) COS Oo. 
We may now define a distance function 
2 2 
— Xe Ye 
dx, z = P + p (6.22) 
where a and b are the half-lengths of the two axes of an ellipse. 
Substituting the observation value z;*" = (x;°“, yp“) into (6.22), dx,z < 1 in- 


dicates that z,^" and X k|k-1 are correlated; while dx,z > 1 indicates that z;^" is 


uncorrelated with X k|k—1- 

On the basis of the distance function of the above observation-to-observation 
and observation-to-track association, we build a distance matrix for all the passing 
points, and use Global Nearest Neighbor (GNN) algorithm to associate the observa- 
tion with observation or a track. 
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6.4.1.5 Single Track Management 


A single track management is an important step of object tracking. In our approach, 
for every observation point, if there are 3 correlated observation values among 5 
consecutive values, we can initialize the EKF, finishing the start of a track. 

Track holding is to keep the tracks of objects continuously by the beforehand 
stated rules after the start of the tracks. We use a sliding window detector to hold 
tracks, where an N/M rule is used to judge whether these tracks exist. In other 
words, N correlated observation values out of M observation values are considered 
for the track to exist. With the increase of the holding time of a track, the belief of 
this track is getting bigger and bigger. Consequently, in terms of actual implemen- 
tation, M and N/M during the start period of a track can be set to a smaller value 
than that of the later period of tracking. In our approach, M = 8, N —5. 

To process a vanishing object, canceling of tracks is necessary. There are three 
categories tracks required to be canceled. The first one is the point without ini- 
tialization: If there are no 3 correlated consecutive observation values, the track is 
canceled. The second one is a start track: If the N/M rule is violated, the track 
is canceled. The third one is a track made by a reverse direction object: When the 
object moves behind a host car, the track can be canceled immediately. Figure 6.4 
shows the tracks of multiple objects using a radar sensor. 


6.4.2 Lidar and Radar Track Fusion 


6.4.2.1 Data Alignment 


Since lidars and radars work independently and are unsynchronized, for multi- 
sensor fusion, we must first transform the different coordinates into the same co- 
ordinate system and then fuse the local tracks. Here we map the lidar coordinates 
and radar coordinates into the vehicle coordinates to solve the position alignment. 
Moreover, we synchronize time between lidar and radar by using the prediction 
equation of EKF. 


6.4.2.2 Track Association 


On the basis of the two local tracks of a lidar and a radar, X ; and X r, We can yield 
the corresponding relation between the two local tracks. The distance function is 
defined as [3, 5] 


di, = (Xi — X,)! (Pj + P, — Pr — Prt) (Xi — X,). (6.23) 


In actual implementation, we neglect the cross-covariance matrixes between the 
lidar and radar: Pj, and P,;, that is, Pj, = P,; = 0. 
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Fig. 6.4 Multiple-object tracking using a radar sensor 


Let x = dij, and suppose it has a X? distribution with M degrees of freedom with 
the density in the form [3] 


(j= A T et, (6.24) 
2? TG) 


where I' is the Gamma function with the following properties: 


r(5) = JT, rd) 21, D'(m 4 1) 2 mU(m). 
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The probability of x € (0, o) may be written as 


a= f f (x) dx. 
0 


On the basis of a o corresponding to a given o, we can set the ellipse association 
gate 


(6.25) 
Hı : dir 0, 


eee 
where Ho indicates that X ; and X , come from the same object; Hı indicates that X / 
and X, come from two different objects. 
Assumed that there are N track pairs which pass the association gate, we rank 
the track pairs by the corresponding distance value d;;. Since one object has only 
one track pair, we take the track pair with the minimum distance value dj,. 


6.4.2.3 Track Fusion Algorithm 


There now remains the problem of track fusion given local tracks of the lidar and 
radar, X 7 and X r, and their covariance matrixes: P, and Pj. To solve track fusion, 
we use the Maximum Likelihood Estimation (MLE) approach to fuse the tracks [5]. 
First of all, we assume that the state estimation error has a Gaussian distribution, 
and then obtain the state estimation value and its covariance of a local track in the 
form [5] 


ae (6 26) 


Pai = (Po! RP 


Through the above process, we can yield the Regions of Interest (ROIs) using 
global tracks. Moreover, we can extract more accurate environment structure using 
visual information. In our approach, the CCD sensors implement lane recognition 
and vehicle detection. Our lane recognition approach is an Adaptive Randomized 
Hough Transform (ARHT) [12] described in Sect. 4.3, which implements robust 
and accurate detection of lane markings without manual initialization or priori in- 
formation under road environment. The results of lane recognition provide the road 
structure and limit the region of obstacles. In terms of vehicle detection, we use 
Gabor features to represent and detect vehicles in ROIs [4]. Figure 6.6 shows the 
fusion results of the three sensors at the Springrobot platform shown in Fig. 6.5. 


6.5 Conclusion and Future Work 


In our approach, we have proposed an interactive road situation analysis frame- 
work and its algorithmic implementation, namely the multiple-sensor multi-object 
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Fig. 6.5 Intelligent driver assistance and safety warning platform—Springrobot 
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detection and tracking approach. We put emphasis on the future situation evalua- 
tion rather than current obstacles situation. Vehicle dynamics and driver behavior 
are considered as two influencing factors for various IIDASW systems. In addition, 
comparing other similar systems, our framework is a more integrated one, where the 
control module based on preview-following is involved, which yields a concise and 
efficient framework. 

There are also several questions that need to be further investigated in our future 
work. For special applications, deciding how to select and setup the sensor network 
is also very important. We calibrate the sensors in our system, and it is normally the 
case that it needs many manual operations. Needless to say, joint calibration of a 
multiple-sensor including a camera, lidar and a radar is desired to be automatically 
made in all driver assistance and safety warning systems. 
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Chapter 7 
An Integrated DGPS/IMU Positioning Approach 


7.1 Introduction 


For autonomous navigation, vehicles must be capable of determining their global 
and local positions within their surrounding environment [10, 19, 27]. However, ve- 
hicle localization is one of challenging problems due to the following issues. First of 
all, sensor noises give rise to inaccurate position information in global localization. 
If the vehicle can obtain the accurate global position information, we would sim- 
plify localization problem a lot. Unfortunately, the precision of the nowadays GPS 
is about several meters. Though differential GPS can provide the promising reso- 
lution of several centimeters, on-vehicle GPS terminals could hardly receive any 
signals especially in some urban environments. In addition, other sensors’ noises 
(cameras, sonar, etc.) also degrade the localization a lot. Second, vehicle location 
localization is not only to obtain the absolute position, but also to capture relative 
position relationship between a host vehicle and its surrounding objects. This plays 
an important role in obstacle avoidance. 

The commonly-used positioning sensors are the Global Navigation Satellite Sys- 
tem (GNSS), Inertial Measurement Unit (IMU), and encoders. The basic elements 
of the GNSS are the set of satellites, ground augmentation systems, and user equip- 
ment. There are four GNSS over the world: Global Positioning System (GPS) [15, 
16], GLONASS [17], Galileo [3], and BeiDou/Compass [4, 14]. Among these 
GNSS, GPS is the most commonly-used for vehicle navigation and localization. 
As we mentioned before, the GNSS cannot provide accurate positioning informa- 
tion at any time or any place. Hence, combining IMU and encoders is capable of 
compensating for the disadvantages of the GNSS. The GNSS and DR are usually 
mutually complementary. On the one hand, the GNSS provides the absolute position 
to an IMU for both the initialization of the vehicle position and for sensor correc- 
tion. On the other hand, the result of an IMU could compensate the random errors of 
the GNSS. As a result, combining two approaches can overcome the disadvantages 
of each single approach, thus improving positioning precision [1, 23]. 

Many problems in vehicle localization and navigation require estimating the 
states of a system that change over time using noisy time sequences. The state— 
space representation to dynamic time sequence modeling provides the state vectors 
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of a system and the relationship between state vectors and measurements. Moreover, 
to inference a dynamic system, both a system model and a measurement model are 
necessary in a deterministic/probabilistic form. In terms of linear/Gaussian systems, 
Kalman filtering approaches are linear least square-root estimators [12]. Though 
Kalman filtering was originally developed for a linear system, the Extended Kalman 
Filtering is used to handle nonlinear cases by approximating nonlinear functions us- 
ing partial derivatives [18]. However, all of the previous approaches fail for multi- 
modal pdfs and heavily skewed distributions. 

In this case, the system models could be both nonlinear and non-Gaussian. 
Hence, particle filtering, a.k.a. Sequential Monte Carlo filtering, is applied to repre- 
sent the state distribution using particles [11, 13]. In recent years, this approach is 
widely used in robot localization [11, 13, 22] and sensor fusion [25]. 


7.2 Related Work 


The GPS has been widely used in vehicle navigation [5, 7, 24]. However, this kind of 
positioning systems based solely on GPS does not work well when the GPS signals 
are very bad due to either object blocking or not enough satellites. Therefore, many 
real-world positioning systems integrate the GPS with dead reckoning sensors to 
provide better positioning solutions. Moreover, KF/EKF/UKF/PF based techniques 
are widely used to both fuse the GPS data and DR data, and iteratively estimate 
vehicle position [2, 28]. 

In the last 30 years, integrated GPS/DR approaches are widely used in vehicle 
navigation [1, 27]. Abbott et al. presented a quantitative examination of the impact 
that individual navigation sensors have on the performance of a vehicle navigation 
system [1]. Gamini et al. investigated building an environment map while simulta- 
neously calculating the absolute position using this map in an unknown environment 
[8]. Cui et al. proposed a vehicle positioning approach with GPS especially in urban 
canyon environments where the GPS signals are easy to be blocked by high build- 
ings, thus yielding insufficient satellite coverage. To this end, the authors presented 
a constrained method by approximating vehicle path using line segments. By doing 
so, the system can reduce the minimum number of required available satellites to 
two. In recent years, Vehicle AdHoc Networks (VANets) are playing an important 
role in communicating to provide various applications varying from safe driving to 
assisted driving. Boukerche et al. discussed the positioning requirements of the main 
VANet applications based on data fusion techniques [5]. Moreover, the authors in- 
vestigated how to combine these positioning techniques using data fusion to obtain 
robust positioning solutions in VANets. In TELEcommunications and inforMAT- 
ICS (TELEMATICS) systems, a car navigation system plays a core role in both safe 
and comfortable driving. Low-cost DR systems are critical for extending a commer- 
cial navigation market. To this end, Cho et al. presented a low-cost GPS/DR system 
where the DR system consists of an accelerometer and a gyro [6]. Moreover, the 
authors investigated the performance of three estimating techniques, EKF, Sigma- 
Point KF (SPKF), and the Sigma-Point-based Receding-HKF (SPRHKF), in various 
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Fig. 7.1 The framework of 
the DGPS/IMU positioning DGPS/IMU 
system 
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situations. Yang et al. proposed a nonlinear filter algorithm for GPS/DR positioning 
system, combining SR-KF and SR-UKF [24]. The experimental results show that 
the proposed algorithm has both higher filtering precision and better stability than 
those of the EKF. As we know, in-car positioning and navigation systems not only 
guide drivers from one location to another by GPS/DR and a map, but also provide 
communication service. Skog et al. presented data sources and fusion techniques for 
an in-car navigation system [20]. Also, the authors introduced the advantages and 
disadvantages of the four commonly used basic positioning sensors. 


7.3 An Integrated DGPS/IMU Positioning Approach 


In a GPS/IMU navigation system, IMU provides position, velocity, and pose, while 
GPS provides position information for correcting IMU in general [9, 26]. However, 
in our navigation system, we directly use the observed data from the DGPS as input 
of GPS/IMU data fusion, without requiring separate DGPS filters. When DGPS 
does not work well, IMU will provide localizing parameters, shown in Fig. 7.1. The 
robust DGPS/IMU data fusion and IMU Filter are described below. 


7.3.1 The System Equation 


In this section, we assume that land vehicles are moving objects in 2D planes. Let 
X! —[enén é ii e, & ôo ôs] denote a state vector, where e and n are the coor- 
dinates in the x- and y-directions, respectively; e/r and e/n are the velocity and 
acceleration in the x- or y-directions; €e and €, are the position errors in the x- and 
y-directions, respectively; dg and ô, are the relative rotating angle error of gyros and 
the distance error of encoders. 
As we know, we have the following equation from Newton's laws of motion 

Í -42 

ra ] (7.1) 

v=at, 
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where t is the time, v is the velocity, a is the acceleration, and s is the distance. 
From (7.1), we can see that both the distance and velocity are calculated from ac- 
celeration. Though driving tasks are quite complex due to the effects of routines, 
road surfaces, drivers, and traffic jams. The state change of a vehicle is provided 
either directly or indirectly from the acceleration change. Therefore, it is important 
to model acceleration change for building dynamic models of the vehicle. 

There are many operations in vehicle driving, such as making a turn, accelerating, 
decelerating and stopping a car, due to the complexity of urban traffic environment. 
Hence, the acceleration of a vehicle is represented using the “current” model [21, 
28, 29], and then the acceleration change of the vehicle is a first-order stationary 
Markov process 


é — de + Ae, ae = —Ta, ` de + Wa,, (7.2) 
n — än d dg, Oj Ty, duce Maus (7.3) 


where de and a, are zero-mean Singer acceleration processes; de and a, are the 
means of acceleration in the x- and y-directions, respectively; Wa, and Wa, are 
zero-mean white noises with constant power spectral densities EU and 2T,,, 6 
respectively; Ta, and Ta, are the multiplicative inverses of correlation time constants. 
At the same time, we model the other errors as a first-order Markov process as 
follows 


Ce = Te, Ee 06,5 En = —Te,€n F We,» 


(1.4) 
óg = —T5,09 + Wsp, ds = — Ts, Ôs + O5, , 


where Te, , Te, , Tôg» t5, are the multiplicative inverses of correlation time constants; 
Wes, We,» Ws,, Ws, are zero-mean white noises. Hence, we obtain the discrete state 
equation described as 


Xk+1 = k+1,kXk + Uk + Wi, (7.5) 
where 


hx2 Thy2 Cı 2x2 0» 
0x2 hx2 C2 2x2 02x2 
Pkpik = | 02x2 0252 Ei Q0»2 02x2], 
(005.0 02542 06.2 E», 2x2 
02x2 0242 02x2 02x2 20F 


1 1 
Cı = diagr (= + Tae T + eg 1), = 
T2 T 


dn 


iruren] 


C2 = diog —(I —e Mel). EX = emt) 
Ta, 
Be diag[e "^, x ty E» = diag[e "4, erent | 
F= diag[e "e^, OT, 
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T 
Ux = [u], U2, U3, U4, U5, U6, Tix] , 








o n |] ue. 
b= =| 1+ + ———— Jae, 
Ta, 2 Ta, 
1 toe  1—e tt | 
uz = — | -T + + —— —— jen: 
Ta, 2 Ta, 
( — y ( — ja 
u3 = | T ————— Jae, u4 = | T — —— —— Jan, 
Ta, ce 
us =(l—e “T )ae, 


ug = (1 —e ‘an T )Gn. 


Wy is a white noise sequence, E[W; Wij] = 0 (Vj #0), and its covariance matrix 
is 


Qi; 062 


06x2 
Qr = E[W W} ] = | 0x6 Qm Ox |, (7.6) 
02x6 05.2 20013 
where 


Q11 = ldijloxe, 
2 























o 2r T 
me E —e eT n T + X - -2:2 T? - ins Tete 
de 
o2 
Q13 = 431 = a [e eet + 1 — 2e aT + 2r, Te "et — 255,1 + i T I 
US : 
o2 
de » e 
Q15 = q51 = 5 [1 e 2a, T x 2m Te mee 
Ta. 
2 Ir? T? 
q2 = — f Sg au qug cue se pe uc ins Te" | 
Ta, ‘ 
o2 
q24 = q42 = = [e 1m T dubeedog two de 2r, Te "t — 2Tq, T + is To 
U x 
c? 
an = = 
426 = 762 = 5 [1—e cup ene tant], 
Ta, 
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de 
q33 = 





2 Aer 3 ee 2T], 


de 
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2 





O 
q44 = 3 [4e "m^ eiu e ^T uL 2m T 
Ta. 
2 
O 
q46 = q64 = 2 pun -1-2e el], 
an 
2 nr 
q55 = og, |1 = 67e ], 


qd66 = O4, [1 — eg tant], the other qij = 0, 
Q12 = diag(oz (1 — e ^er), o; (1 xs e ^T), 


€ 


O13 = diag[og (1 cm eT) of (1 = Ez CH 


7.3.2 The Measurement Equation 


In this section, we can obtain position and velocity, e,, np, e, and én, from DGPS 
devices, the distance S from odometers, and the rotation angle Q from gyroscopes. 
Furthermore, the relationships between the state vectors and the measurement vec- 


tors are described as 
Zi taeda ree a hee (7-1) 
where 
Zk,ı = er (k) = e (k) + £e(k) + vı (k), 
Zk,2 = n, (k) = n(k) + £n (k) + v2 (k), 
Zk,3 = êr (k) = e(k) + v3 (k), 
Zk,4 = fip (k) = (k) + v4 (k), 


E E E E E E E aos 


n(k) 
k—1 
(y =0, =x orm), B(k—1)=a(0)+  [Q() + 60@)]. 
i=l 
Zk,6 = S(k) = TV e (k) +h? (k) + ôs (k) + v6 (Kk). 
Note that v;(k) (k = 1,2,...,6) is a zero-mean Gaussian white noise sequence 
with the covariance matrix R(k) = diag{r?, i m re} and V, = [vi (k), v2(k),..., 
v6(k) |. 
Hence, we obtain the measurement equation 
Zk = hl XK] + Vk = HiykXi + Ar, « [Xk] + Vk, (7.8) 
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where 
1000001000 
y, 0100000100 
‘10 0100000 0 O]’ 
0001000000 


Hy Xp) o | 09 86 D+ doh) 
MODE LTV 8200 5,00] 


Since (7.8) is a nonlinear equation, there are many potential choices to linearize 
it, thus yielding the solution [24]. Here, we use Extended Kalman filtering (EKF) by 
linearly approximating nonlinear measurement system around the last state estimate 


Ze © h[Xg a1] + AX Al Xi — Xk k1] + Ves (7.9) 
where 
hDx2o 00x22 02x2 hx2 2x2 
H|Xk k-11] = | 02x2 hx2 92x2 02x2 O2x2ļ, 
02x2 Hox2 02x2 02x2 hx2 
zoa | 
x 2 
Hox2 = Tė x 


Akgeeppe aferta qx. ua 


Letting [Xi 4.1] = Zk — h[Xi k-1] + AX K-11 Xk, 4-1, we have 








[Xi x 1] © H[Xk kx 1]Xk + Vk, (7.10) 


Equation (7.10) is a linearized measurement equation. Note that a fundamental 
advantage of the EKF is that the distributions of the random variables are no longer 
Gaussian, and the EKF only approximates the optimality of the Bayes' rule by lin- 
earization. 


7.3.3 Data Fusion Using EKF 


Upon both (7.5) and (7.10), we use a nonzero mean-adaptive acceleration model to 
represent acceleration change [28]. Therefore, we can obtain an adaptive Kalman 
filtering algorithm stated below: 


l. 


/ 
Xk k—1 = Py k-1Xk-1,; 
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where 


Dx? Thx2 T byi 02x2 02x2 
0x2 hx2 Thx2 02x2 02x2 
Pkk- = | 02x2 02x2 hx2 02x2 02:2 
02x2 02x2 02x2 Ej 02x2 
02x2 0242 0052 O22 Do 


We would like to point out that o 1 18 used to replace 9, x1. The rationale 
behind this is that it is equivalent to increase the sampling rate (T — 0). For the 
details, we refer to [28]. 


2. 
Pk k-1 = Dr y a Par + Qk-1, 
Pi kı E Pr a Pj 101, + Qi-1. 
where 
hbhx2 Thx Cy 02x2 
[PE 022 lox2 C2 0» TS Qi, 062 
NC 00542 052 Ey Qxl’ i 0x6 Q1] 
00542 Oox2 055» E» 
3. 
Ky, = Perah [X T m 
k = Pea H^ [Xil H[Xk k1] Pkk- HA [Xil] + Re}, 
- 
K; = P; A Xia H'DXi a ]Pp 1H” Xe +R} 
ad. 


T 
Xk = Xk, k—1 + [Gy G} | ; 
where G; = K,(Z, — hil Xy, kil G2 = [g1; 25]' , and gı and g2 are ninth and 
tenth elements of the vector Ky (Zy — h[Xy k-1]}. 


P, = |I — K,H, Pi k-15 Py = [I — Kk H[Xk k-11} Paca. 


When a vehicle cannot get DGPS signals at time k’, the above equation cannot 
be used to get a correct solution. In this case, the data fusion stops. Therefore, 
the following equations provide vehicle position 


ep = ey 4 + Sy Sin[Bp 1 + Oe], ny = ny + Sp cos[Bp 1 + OI. 
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Chapter 8 
Vehicle Navigation Using Global Views 


8.1 Introduction 


Driver inattention is a major offender to highway crashes. The National Highway 
Traffic Safety Administration estimates that at least 25% of police-reported crashes 
involve some forms of driver inattention [15]. Driving is a process that requires a 
driver to distribute his/her attention among different sub-tasks. First of all, a driver 
needs to pay attention to issues directly related to safety, including the surrounding 
traffic, dashboard displays, and other influx of information on the road such as traffic 
lights and road signs. In addition, the driver may choose to talk to a passenger, listen 
to the radio, and talk on the cell phone. Therefore, situation awareness plays an 
important role in driving safety. In this research, we are developing technologies to 
provide a driver with the information of dynamic surroundings around the vehicle 
when he/she is driving to enhance his/her situation awareness. 

Situation awareness is defined as the perception of the elements in the environ- 
ment within a volume of time and space, the comprehension of their meaning, and 
the projection of their status in the near future [7]. Sensing and representing infor- 
mation is a key for situation awareness in driving a vehicle. A lot of research has 
been directed towards improving in-vehicle information presentation. Green et al. 
surveyed early studies on human factor tests in navigation displays [13]. They de- 
scribed objectives, principles and guidelines for the design of in-vehicle devices. 
Dale et al. investigated the problem of generating natural route descriptions for nav- 
igational assistance [6]. Lee et al. developed a situationally appropriate map system 
for drivers [14]. 

Navigation user interfaces have changed dramatically over the last few years due 
to the availability of electronic maps and the Global Positioning System (GPS). The 
displays in the current GPS navigation systems show the location of a vehicle on 
a graphical map in a way that is similar to looking straight down at a paper map. 
Recently, several companies, such as Microsoft and Google, have started providing 
global view maps, such as aerial imagery maps, satellite imagery maps, and bird’s 
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eye view maps. For example, in the bird's eye view mode, Microsoft's Windows 
Live Local consists of high resolution aerial imagery taken from an angle rather 
than straight down from four directions. Besides the GPS, a vehicle can also ob- 
tain information about the driving environment from other sensors, such as video 
cameras mounted at various positions, thermal infrared imagers, RADAR, LIDAR, 
and ultrasonic sensors [9]. Among these sensors, video cameras are attractive from 
a packaging and cost perspective. Recent advances in computer vision and image 
processing technologies have made it possible to apply video-based sensors along 
with the GPS in driving assistance applications. 

Here, we propose a novel method to enhance situation awareness by dynamically 
providing a global view of surrounding for drivers. The surrounding of a vehicle will 
be captured by an omnidirectional vision system at the top of a vehicle. In order to 
obtain high quality of surrounding images, we use an omnidirectional vision system 
consisting of multiple cameras [2, 5], rather than a catadioptric camera used by 
the most existing systems for intelligent vehicles [1, 11]. The video stream from 
the camera is processed to detect nearby vehicles and obstacles. Positions of these 
detected objects will be overlaid on a global view map of the vehicle. We deduce the 
mapping between an omnidirectional vision system and global view map. This map 
can be projected onto a Head-Up Display (HUD) on the windshield and provide a 
dramatically realistic perspective view of the driving environment. By looking at the 
display, a driver can have a global picture of the situation and likely produce a good 
driving strategy. 

The rest of this chapter is organized as the following: Sect. 8.2 describes the 
problem and the proposed approach. Section 8.3 discusses the imaging model of 
our camera system. Section 8.4 presents a panoramic Inverse Perspective Mapping 
(pIPM). Section 8.5 shows how to implement the pIPM. Section 8.6 introduces the 
elimination of the wide-angle lens radial error. In Sect. 8.7, we illustrate the pro- 
posed method by an example that maps vehicles detected from the video stream 
captured by am omnidirectional vision system onto the Google Earth map. 


8.2 The Problem and Proposed Approach 


The field of view, which is the part of the observable world that is seen at any given 
moment, plays an important role in driving safety. While a human has an almost 180- 
degree forward-facing field of view, his/her binocular vision, which is important 
for depth perception, only covers 140 degrees of the field of vision. In a driving 
situation, it is desirable to have a complete 360-degree field of view. In order to 
expand a driver's field of view, automobile manufacturers have equipped a rear-view 
mirror and side mirrors on vehicles. More recently, rear-view video cameras have 
been added to many new model cars to enhance the ability of the rear-view mirror by 
showing the road directly behind the car. These camera systems are usually mounted 
to the bumper or lower parts of the car allowing for better rear visibility. However, 
looking at mirrors can move a driver's attention away from the road. 
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Fig. 8.1 Panorama images taken by a camera array sitting on top of a vehicle 


Adding sensors and devices in a vehicle can potentially lead to more distractions. 
Inattention is one of leading causes of car accidents, estimated to account for 25% 
of all road traffic accidents. Our goal, therefore, is to increase a driver's field of 
view without adding distraction sources. In this approach, we propose to capture 
surroundings of a vehicle by an omnidirectional vision system mounted at the top 
of a vehicle and display the dynamic global view on the windshield using an HUD. 
In this way, a driver can obtain a global view of the surrounding without shifting 
his/her attention away from the front view of the vehicle. 

Omnidirectional vision system has been previously used in intelligent vehicle 
applications, such as vehicle tracking, indoor parking lot, and driver monitoring 
driver, etc. [8, 10, 11, 16, 17]. These applications used different omnidirectional 
sensors, such as wide Field-Of-View (FOV) dioptric cameras, catadioptric cameras, 
Pan-Title-Zoom (PTZ) cameras and polydioptric cameras. Both wide FOV diop- 
tric cameras and catadioptric cameras have some limitations. First, their images are 
heavily distorted, and we have to spend much time on correcting the distortion. Sec- 
ond, they cannot provide high resolution images of surroundings. PTZ cameras are 
often used in environment surveillance by moving the cameras. Although PTZ cam- 
eras can provide high resolution images, mechanical motion of the cameras causes 
slow system responses. Instead of using these cameras, we will use an omnidirec- 
tional vision system consisting of multiple cameras to capture a full view of the 
surroundings around a vehicle with the high resolution up to 1600 x 320 simulta- 
neously [5]. Figure 8.1 shows examples of two panoramic images captured in our 
experiment. Rectangles in the images are the detected vehicles. 

However, the panoramic video stream from the omnidirectional camera cannot 
be easily understood by a driver, so we map the driving situation onto a global view 
map. That is, we automatically extract objects (vehicles, pedestrians, etc.) from the 
video stream and mark their positions on the global view map. We use a hypothesis- 
validation structure to detect the nearby vehicles surrounding a host vehicle [4]. 
Without losing generality, in this approach, we have utilized Google Earth which 
can provide us with high quality and resolution aerial and satellite images including 
highways, streets, and more. In addition, data import feature from Google Earth 
makes it possible to sense and represent the dynamic information surrounding a host 
vehicle and import our custom geographic data into the Google Earth application. 
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Fig. 8.2 Geometric relationship among the vehicle, camera array, and image coordinate system of 
each individual camera: (a) front view, (b) image plane of an individual camera, (c) aerial view, 
(d) camera array layout 


8.3 The Panoramic Imaging Model 


In this section, we describe the mathematical model of panoramic imaging and 
provide a context for the mapping between the panoramic image and the global 
electronic map. There are three different coordinate systems as shown in Fig. 8.2. 
X, Y, Zv is the vehicle coordinate system. XY. Ze is the coordinate system for in- 
dividual camera c in the camera array where c= 0,..., N — 1, and N is the number 
of cameras. UOV is the image coordinate system of camera c. Let r denotes the 
radius of the camera array, and 0 = 27x / N the shift angle. The 3D coordinates of the 
camera array center is [/, d, h]| in the vehicle coordinate system. The orientation of 
camera c is defined by two rotation angles o and f. as shown in Fig. 8.2(a) and (b). 

Assuming the road surface is horizontal, the coordinates of the optical center of 
camera c in the X, Y, Zy coordinate system are 


TE = [I + r cos e, d + r sin be, h]? = [I d'h]. (8.1) 
Given any 3D point, let P, = [xy, Yv, Zu]! denote its coordinates in the vehi- 


cle coordinate system X, Yy Zy. Let Pe = [X¢, yc, zc]! denote its coordinates in the 
camera coordinate system X. Y. Ze. We have 


COS Qe COS e sinBbe  — sin Qe cos pe 
P, = | —cosæcsin fe cosße singesinfe | Pe + Tg. (8.2) 
sin Œe 0 COS Qc 


Consider camera c's image plane UOV as shown in Fig. 8.2(b). Let (uo, vo) de- 
note the coordinates of the principal point. According to perspective projection, any 
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point S(u, v) on the image plane satisfies the following equation 


Ye  u — uo Ze v — vo 














E 9 ui , (8.3) 
Xc fu Xc fv 
where fu, f, are the scale factors along the U- and V -axis, respectively. 
Let xe — t, t € [0, oo), then applying (8.3) yields 
u — uo U — Uo 
DU M ye —t- ga eme (8.4) 
Í Í fu Í fo 
Therefore, the parametric equation of line Oc S can be written as 
u—uo v-v] 
Be yeze or [1 29. - | (8.5) 
CUT fu fv 


Substituting (8.5) into (8.2), we obtain the line equation in the vehicle coordinate 





system 
Xi l l’ 
u—ug 
yw SER- | $5 J+] a [. (8.6) 
Zu E h 


where R is the rotation matrix in (8.2). 

If we assume the road is flat, the equation of the road plane is zy = 0. Therefore, 
the intersection of the line Oc S with the road plane can be obtained by setting zy = 0 
in (8.6), which yields 


sin Be 
Xy(u, v) =h'[u' v' 1] | sinæecos ße | +1, 
COS Œe COS Be (8.7) 
COS be 
yy (u, v) —- h/[u' v' 1] | —sinæesin be | +d’, 
— COS Œe sin be 


h y = u—u( g= 
V’ COS Æe —sin Qe ? u’ 


fo 
Note that the object detection is performed on the stitched panoramic image. 


U—UQ 








where h’ = 
Given a pixel on the panoramic image, its corresponding positions on the individual 


camera image planes are obtained from the stitching table, which is generated by a 
stitching calibration process [5]. 
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Fig. 8.3 Mapping from a single image to panoramic image 


8.4 The Panoramic Inverse Perspective Mapping (plIPM) 


8.4.1 The Mapping Relationship Between Each Image and 
a Panoramic Image 


In this section, we build the mapping relationship between each image of its cor- 
responding camera and a panoramic image. Let (uc, v.) represent the image co- 
ordinates of the cth camera. Define the cylindrical panoramic image coordinates 
captured by all N cameras as (05, Vp), where 0, € (0, 2z) is the panning angle 
shown in Fig. 8.3. Therefore, we obtain the mapping relationship of the cth camera 
coordinates (uc, vc) and panoramic image pixel coordinates (0p, vp) according to 
Fig. 8.3 as 


Mid endi (8.8) 


Up = Ve; 
where W, is the single image width (here we assume that each camera has same 


width), Wp is the panoramic image width. To guarantee 0p € [0, 277), we implement 
the following operation 


{0p }[mod 27] — 01, 6 € [0, 277). (8.9) 


To simplify the description, we assume that each pixel within a panoramic image is 
solely from a camera. 
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8.4.2 The Panoramic Inverse Perspective Mapping 


From (8.8), we obtain 


Wp 
Uc = ax p — pc) + (We — uo), Uc — Up. (8.10) 


Substituting (8.10) into (8.6), we obtain the mapping relationship between a pixel 
(05, Up) within the panoramic image and a 3D point (xv, yy, Zv) in the vehicle co- 
ordinate system as below 


Xy 1 0 0 
w | SIR |0 y^ 0 
£y 0 0 -f;! 
1 0 O7f 1 0 
x|[|0 -72 0O[||60p | - | z2fcd- We —2uo TÉ. (8.11) 
0 0 tL] Lup —vo 


From (8.1), we know that both the rotation matrix R and the translation vector 
Ty — [l,4', h]! are functions of Be, that is, R = R(Be), l' = l (Be), d' = d'(B-). 
The camera external parameter D, is a piecewise linear function of 0p: 


(8.12) 
0p € (—(0o — Abu + (c + 53)0c), — (8o — A8, + (c — 5)0-)). 


| B. (05) = —(c6. + 80), 
where A0, — w (uo — E) is a constant that the image center deviates from the 
camera center; 09 is the angle of the #0 camera w.r.t. Xy. Similar to (8.8), we imple- 
ment same operation to guarantee bc, 0, € [0, 27r). 

Similar to (8.7), substituting road surface constraint equation zy = 0 into (8.11), 
we obtain the mapping relationship between a 3D point (xy, yy, 0) on the road in 
the vehicle coordinate system and a point (0p, vp) in the panoramic image in the 
following form 


sin Bc (Op) 
Xy (Op, Vp) = h,[0,, v. 1] | sino cos Be(05) | +7, 
COS Œe COS Be (Op) 
cos Bc (05) 
Yup, vp) =h, 105, v, 1] | —sinae sin B.(6p) | +d’, 
— cos a sin Bc (Ap) 


(8.13) 


Wp 
h D x (0p—Bc)- (Wc —2uo) | __ Up—0 A 
vi, cosæe— singe? ^D - f i; aS a Comparing (8.7) 


and (8.13), we see that both perspective imaging and panoramic imaging have uni- 
fied IPM forms. 


where n, — 
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Fig. 8.4 The illustration of 
FOV of the panoramic 
camera in the vehicle 
coordinate system 






center 


8.5 The Implementation of the pIPM 


8.5.1 The Field of View of N Cameras in the Vehicle Coordinate 
System 


The first step of the implementation of the pIPM is to determine the Field of View 
(FOV), x, € [—-H,/2, H,/2], y, €[-W,/2, W;/2], shown in Fig. 8.4. 


8.5.2 Calculation of Each Interest Points View Angle in the 
Vehicle Coordinate System 


For each point in the vehicle coordinate system, we calculate its view angle and 
determine the corresponding mapping camera. In Fig. 8.4, X, OY, is the vehicle 
coordinate system, 0 is the view angle, we calculate 0, using x, and y, and the 
equation written below 


[seno de) T -egn Ov dem. Xy — l5 = 0 (Y, axis), 


ESEM UT RT yy — d; =0 (X, axis), 








Us arctan( ="), xy — l; > O and y, — d; > 0, (8.14) 
T + arctan( 3:755), Xy — l5 < 0 and y, — ds £0, 





2x + arctan(@="), — x, — ls > 0 and y, — ds < 0. 


S 


where ls = sgn(x,)l, d; = sgn(yy)d, and where sgn(-) is the sign function. As a 
result, we can determine the image plane and f. of (xy, yy) given Og by (8.12). 
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8.5.3 The Mapping Relationship Between a 3D On-road Point and 
a Panoramic Image 


From (8.13), we obtain 


1 
oa i V? (6, — 8,)2-(W,—2u9) 
RC ys bed | ee ek (8.15) 
ZU h || Up—Vv0 
E 


Using the first line of the above equation and z, = 0, we obtain 
xy —l 
t = [Ru R, Ra || yv- d’ |. (8.16) 
—h 


Similarly, using the results of the above formula, we obtain 


27 fu —1 -1 —1 CXu 
Op = W. Ur LU » Roy» Raz | a — (W; —2ug)] + Be: 
N (8.17) 
f ee ee cxy —l 
Up — — 7 [R3] , R35 , R33 ] br + vo, 


where Rj; is the element in the matrix R^! which belongs to the ith row and the 
jth column. 

Substituting 0, from (8.14), B. from (8.13) and t from (8.16) into (8.17), we can 
obtain the mapping relationship between a panoramic coordinate (0,, Vp) and its 
corresponding point (xy, yy) in the vehicle coordinate system. 


8.5.4 Image Interpolation in the Vehicle Coordinate System 


As we know, (05, vp) calculated from (xy, y», 0) could be between pixels. There- 
fore, we have to calculate the intensity values of each pixel of the panoramic image 
by interpolating algorithms. Set (Oy, Up) = (L@p], [vp ]); here |-| is the floor func- 
tion. Let p; and p» denote the distance between (05, Vp) and ( Up) along 0 and v, 
respectively, as shown in Fig. 8.5. Here, 0 < pi, p2 < 1. Therefore, the intensity 
value of (xy, yy) 1s 


Ij (xv, yy) = Ipp, Bp) — pr) — p2) + Ip (05, 05 + (1 — pi) p2 
+ 13(8; + AP, Up) pid — po) 
+ Ip(85 + A05, Bp + D pipa. (8.18) 
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Fig. 8.5 The illustration of 0, 0, 0, « ^6, 22(H,) 
non-integer image 
interpolation 








(a) 


Fig. 8.6 The results of the pIPM algorithm 


where AO, = A We would like to point out that better interpolation algorithms 
also can be considered for improving display performance. Figure 8.6 shows the 
experimental results of the pIPM algorithm. 


8.6 The Elimination of Wide-Angle Lens’ Radial Error 


Due to the effect of the number of cameras, we often use a wide-angle lens to in- 
crease the angle field of view. However, the wide-angle lens will cause radial distor- 
tion; the model is shown below 


- Z (1 ra + korg +xsr9) H +dx, (8.19) 


Uc 


2K3Uc Uc--KA4 (r2--2u2) 2 


2 2 
where dx = r5 = u4 te. 
| K3 (r2--202) 2kauc vc d e e 
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Fig. 8.7 Google global navigation map 


8.7 Combining Panoramic Images with Electronic Maps 


Electronic map services such as Microsoft Virtual Earth and Google Earth can help 
reduce a driver’s load by providing high quality electronic route and turn-to-turn di- 
rections. For example, Fig. 8.7 shows the route, generated by Google Earth, around 
Carnegie Mellon University. 

We can further reduce a driver’s cognitive load by combining the images captured 
by the omnidirectional camera with the electronic map in real time. In particular, we 
perform image analysis to detect surrounding objects such as vehicles and pedestri- 
ans, and display the detected objects on the electronic map. 

In this approach, we mainly focus on vehicle detection. Our vehicle detection 
approach includes two basic phases. In the hypothesis generation phase, we first 
determine the Regions of Interest (ROI) in an image according to lane vanishing 
points. From the analysis of horizontal and vertical edges in the image, vehicle hy- 
pothesis lists are generated for each ROI. In the hypothesis validation phase, we have 
developed a vehicle validation system by using Support Vector Machine (SVM) and 
Gabor features. For details we refer to [3, 4]. Figure 8.1 shows the results of vehicle 
detection in two omnidirectional images. 

Let Vy =[@,y,n]’ denote the coordinates of the host vehicle where $ and y 
are provided by an in-vehicle GPS device, and ņ is the direction of vehicle. Let 
(xy (u, v), yy (u, v)) denote the coordinate of a detected vehicle, then the latitude 
and longitude of the detected vehicle can be written as [12]: 


o |__| kicosn —kysinn ¢ ae op) 
Yo | | kosinn kzcosņ y iei , l 


where kı and kz are scalar values to put the points into the earth’s longitude and 
latitude coordinate system. Thus we can display the detected vehicle on an electronic 
map. Figure 8.8 shows the results of displaying the detected vehicles in the two 
panoramic images as shown in Fig. 8.1 onto the Google Earth map. 
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8.8 Mapping from detected objects onto Google Earth map: (a) The objects are detected from 


the top image in Fig. 8.1; (b) The objects are detected from the bottom image in Fig. 8.1 
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Chapter 9 
The Lateral Motion Control for Intelligent 
Vehicles 


9.1 Introduction 


The goals of intelligent transportation systems are to improve the capacity of ex- 
isting highways, and simplify manual operations under various road conditions. 
Lane following systems are capable of providing safer and more efficient position- 
ing commands. Hence, the functions of lane following systems are twofold. First, 
the sensing system must calculate the radius of curvature of the road and the posi- 
tion of the vehicle relative to the road. Second, the lateral controller must not only 
track the center of the road but also steer the vehicle. In this case, the error between 
the reference path and the actual path is kept minimal by the control at the cost of 
both comfort and stability. A typical lane following system consists of four main 
parts: a lateral controller, a steering wheel, a vehicle and some sensors, as shown in 
Fig. 9.1. The actual driving response to a road is illustrated in Fig. 9.2. 

The rest of this chapter is organized as follows. Section 9.2 reviews the re- 
lated work. Section 9.3 introduces the proposed mixed lateral control strategy. In 
Sect. 9.4, we present the relationship between motor pulses and the front wheel lean 
angle. 


9.2 Related Work 


The vehicle lateral motion control plays a fundamental role in path following [5], 
Automated Highway Systems (AHS) [2, 3, 9, 11, 12, 15, 18], Advanced Safety 
Vehicle (ASV) [8], Automated Formation Changes (AFC) [17], and a lot of work 
has been done. In the AHS, the goal of lateral control is to make vehicles follow 
road/lane marks under various driving conditions, speeds, loads, road types, and to 
maintain good comfortability and stability. Toward this end, Peng et al. combined a 
feedback controller and a feed-forward controller to improve ride quality by using 
the Frequency-Shaped Linear Quadratic (FSLQ) [12]. The feedback controller uses 
the FSLQ to improve performance while the feed-forward controller is to gener- 
ate preview steering commands when the curvature of the coming road is available. 
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Renee ty - 
Fig. 9.1 A typical lateral control framework 
Fig. 9.2. An illustration of Direction 
actual driving response 
time 


Furthermore, the authors present continuous deterministic preview control consist- 
ing of feedback terms and two feed-forward terms [13]. As we know, both safety 
and passenger comfort are important for buses. Therefore, to obtain superior sta- 
bility and maneuverability, Matsumoto et al. controlled lateral velocity and yaw 
rate at the same time by inputting both the force between the front wheels and rear 
wheels, and the rear steering angle [7]. In Real-time Autonomous Navigator with 
a Geometric Engine (RANGER), Kelly presented the state space representation of 
a multi-input multi-output linear system which acts as the perceive-think-act loop 
for a robot vehicle [6]. To maintain smoothness of the steering system at both high 
speed and low speed, multiple look-ahead points were introduced to keep tighter 
turns at low speed. Here, one is used to obtain the deviation from the path while the 
rest are used to predict the steering angle for feed-forward control. Fraichard et al. 
proposed the Execution Monitor (EM) whose goal is to follow a given trajectory and 
respond to unexpected events in real time, thus generating control commands [4]. 


9.3 The Mixed Lateral Control Strategy 


To maintain the smoothness of the steering system, different control strategies 
should be used to control the steering system for linear and curvilinear roads, re- 
spectively. Let us first see how a human driver drives a car. When a car enters into 
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Table 9.1 The relationship between vehicle velocity, viewpoint angle and focusing-on distance 


Vehicle velocity (m/s) 16.667 22.222 27.T18 33.333 38.889 
Angle of viewpoint (°) 43 30 20 11 7 
Look-ahead distance (m) 180 300 420 540 640 
Look-ahead time (s) 10.8 13.5 15.1 16.2 16.5 


a curvilinear road region, the driver perceives the curvature of the road by eyes. 
Afterwards, the driver inputs a proper steering angle thus making a perfect turn ac- 
cording to ‘current’ conditions. Actually, driverless vehicles should work in a same 
way. In linear roads, in-car computers calculate look-ahead distance as the input of 
controllers directly controlling the steering wheel angle of a vehicle. When a control 
system gives a steering signal, the executive part will respond very quickly and the 
steering magnitude is very small. By contrast, when the car is entering a curvilinear 
road, the in-car computer first obtains the radius of the curve and the steering angle, 
and then generates steering commands. To simulate human driving, we introduce 
two different control strategies to adapt to different road conditions. 


9.3.1 Linear Roads 


1. Determining Look-Ahead Distance A human driver is focusing on a specified 
distance before the vehicle when driving. This specified distance is called the look- 
ahead distance which is related to speed. The relationship among vehicle velocity, 
viewpoint angle, and focusing-on distance is shown in Table 9.1. 

From Table 9.1, we can see that focusing-on distance varies accordingly with 
both vehicle velocity and angle of viewpoint. Moreover, we have 


D = 20.88v — 164.01. (9.1) 


Hence, we repeated experiments many times and got the empirical formulation of 
look-ahead distance as 


d, = D/10. (9.2) 


2. Calculating Looking-Ahead Error The actual path of a driving vehicle could 
deviate from its reference path due to rough roads, lateral wind, and initial errors. 
Hence, we need to correct the front wheel lean angle, thus keeping the error between 
the reference path and the actual path at a minimum. The look-ahead distance error 
yp is the distance between point A and B in Fig. 9.3. The slip angle is defined as 
follows: 


pcm arctan( 7) (9.3) 


Xu 
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Fig. 9.3 The geometry 
model of lateral deviation 





where y, and x, are the lateral velocity and longitudinal velocity, respectively. Usu- 
ally, we assume that 0, is negligible since the lateral velocity is much smaller than 
the longitudinal velocity. As a result, we have [5] 


ys © Yu + XuEr + dg&i + (2 Js. (9.4) 
u 
where £, = £1 — &;; € is the angle between the vehicle coordinates and the ground 
coordinates; £+ is the absolute yaw of the reference trajectory in the inertial coordi- 
nate frame. 
In addition, the driving vehicle has the displacement of lateral slip due to the 
front wheels rotating. The displacement yp is then defined in the following form 


yp = dstan(B), (9.5) 
where 
p= z arcsin( SELES?) (9.6) 
2 Xs 


Finally, we calculate the look-ahead distance error as 


Ye = ys t yp. (9.7) 


9.3.2 Curvilinear Roads 


When previously considering linear roads, the strategy of look-ahead distance was 
focusing on a fixed point. In other words, we only considered the error between 
the reference path and the actual path at a specified point. Obviously, this strategy 
ignores other geometry information of a road in front of the vehicle, such as the 
orientation and curvature. As a result, the performance of the controller degrades. 
In this section, we will incorporate more road geometric information into the con- 
troller. 
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Fig. 9.4 The geometry 
model of lateral deviation 





1. Existing Shape Representation Assume that object contours consist of lines 
and arcs, where corners are intersection points between lines/arcs and lines/arcs, as 
shown in Fig. 9.4. The shape representation of a contour is to keep the number of 
both lines and arcs at a minimum by segmenting approaches. There are two types of 
contour segmenting techniques: the first one is direct [14, 19], the other is indirect 
[1, 10]. The direct approaches are to segment object shapes using point sets of object 
contours [14, 19], while the indirect approaches are to formulate the problem of 
segmenting contours into the characteristic functions of the contours [1, 10]. 

The Direct Approaches: Let P, and P, denote the start point and end point of 
a contour, respectively. First, we calculate the distance d; between point P, and 
line P, Pe. If dmax = maxjíd;] < &o (€9 is a small constant), the contour P, P, is a 
line. Second, if dmax > £0, we will continue to segment the contour until dmax < 69. 
Finally, the direct approaches generate some subsets of points which approximate 
lines. The advantages of the direct approaches are as follows: (1) The implemen- 
tation of these approaches is simple; (11) They can achieve very high precision of 
shape representation by adjusting £o. However, the disadvantages of the direct ap- 
proaches are as follows: (1) They have high computing burden due to large amount 
of distance computation between points and lines; (11) The value of £ọ greatly affects 
the result of contour segmentation; (111) The results of segmenting contours are only 
lines, but not circles/arcs. 

The Indirect Approaches: The curvature extreme approaches are typical indirect 
approaches. In this approach, the point P; with curvature C; larger than threshold 
Cr is a contour corner. We can see that: (1) It is convenient to calculate the curvature 
once for each point; (11) Segmented contours are invariant with respect to rotation. 
Unfortunately, the indirect approaches are very sensitive to noise. Moreover, the 
curve extreme corresponds to contour corners, not tangent points. Toward this end, 
we propose a segmenting approach of contours based on the sum of ideal contours. 


2. The Proposed Segmenting Approach of Contours Let us discuss the real 
contour in Fig. 9.4. Assume that the contour consists of lines and arcs. Given the 
types of segmented contours and the positions of all the corners and tangent points, 
we can obtain the curves of the curvature and its sum, shown in Fig. 9.5. In Fig. 9.5, 
the curvature values of line AB, BC, EF are 0, and the curvature values of arc C D 
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Fig. 9.5 Ideal contours: 
(a) curvatures, (b) the sum of 
curvatures 








and DE are —1/R, and 1/ R2, respectively. From Fig. 9.5, we can see that: 


1. The corners of contours correspond to the pulses of curvature Cp which is larger 
than CT. 

2. There does not exist a corner between two neighboring corners, but there could 
exist a tangent point. The contour segment between two corners is either a line 
or an arc if there is no tangent point. Otherwise, the contour segment consists of 
a line and an arc which are tangent. 

3. A line contour segment without a tangent point between two neighboring corners 
must agree in two aspects. First, the curvatures of all the points are 0. Second, 
the sum of curvatures of all the points is O. 

4. A tangent point between two neighboring corners has the following properties. 
The curvature change of the tangent point corresponds to a step wave. At the 
same time, the sum of its curvature changes suddenly. 

5. If a contour segment satisfies Condition 3, it is a line. Otherwise, it 1s an arc. 

6. The orientations of arc contours can be determined by the signs of the curva- 
tures of arcs. Correspondingly, the signs of the curvatures skew can determine its 
orientation. 


Using the above properties, we can segment a contour only consisting of lines 
and arcs. However, real contours could be affected by a large amount of random 
noise. Figure 9.6 shows the curvatures of a real contour. We can see that: (1) the cur- 
vatures of a real contour are not horizontal but curved. Also, the pulse at the corner 
becomes local maximum/minimum in real contours; (11) It is difficult to distinguish 
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Fig. 9.6 The curvature of a 
real contour 














the curvatures of lines and arcs since the curvatures of contours with a large radius 
are greatly affected by noise. 

Now we discuss the curvatures of roads. The values of sampling points consist 
of two items, real data r (s) and random noise 6. The curvatures of lines and arcs are 
0 and +1/R, respectively. 

Consequently, the curvature of the real line is a random function, C;(s) = ój. 
Similarly, that of the real arc is described as +1/R + ó,. From the statistical prop- 
erties of random errors, the sum of random errors is zero, and we have 


N N 1 N N N 
d_Cels)=+) Vt) estr, 06) 2520 (9.8) 
c=] c=| ge l=1 


where N is the number of points on the contour. 


9.3.3 Calculating the Radius of an Arc 


The ideal equation of a circle is given in the following form 
(x — x9) +O- yo)" = 7’, (9.9) 
where (xo, yo) is the center of the circle. It has another form as follows: 
x? + y* — 2xox — 2yoy + (x5 + y — r°) =0. (9.10) 


Given observed points pij (xi, yj), i = 1,2,3,..., N, we substitute them into (9.9), 
and obtain the error function 


A = i + y? — Axoxi — 2yoyi + (xo + i — r°). (9.11) 


From least-mean-square algorithms, we can obtain the radius r of the ideal circle 
oe re : N 
by minimizing 5 | A Defining f (xo, yo. X1, Y1, X2, Y2, -- XN, YN; T) = 2,4 A. 
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Fig. 9.7 The geometric 0 
relationship when solving for 
the radius of a circle 







(Xo. Y) 





paa A 32) 


(X5, y) 


(x. 1) 
we have 
df 
— —0 9.12 
dr (9.12) 
In Fig. 9.7, we define the equation of the line OA as y = kx + b. It is easy to obtain 
2 ur. 2 
k= m and b — LAM Then we can obtain the solution of (xo, yo) by 
using two conditions. First, when y; Æ y», 
xg est Abo) +OF -YDIOi 70 
2Y iH Oyi) ' (9.13) 
yo = kxo +b, 
when y; = y». 
— iz 
X0 I l 2 z , 
yo — Ele] D-2 +O- N) Td 


2^ i(yi—y12 


Now, the radius of the circle is calculated from the distance between (xo, yo) and 
(xi, yi) 
r= y (xi — x0)? + Qi — yo)". (9.15) 


9.3.4 The Algorithm Flow 


1. Smooth sampled data, and calculate the curvatures of each sampling point, thus 
yielding the curve £ — n. The curvature at point (x, y) is represented by 
Xy — yx 


KO y) = (2 32) 


(9.16) 


where x = dx /dt, X = d?x/dt^, y =dy/dt, y = d?y/dt?. Furthermore, its dis- 
crete form 1s 


Xk = Xk41 — Xk-1; Xk = Xk+1 + Xk—1 — 2Xk. (9.17) 
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2. 


Difference operations are sensitive to noise. Therefore, we need to smooth the 
curvature of curves before seeking the extreme of curvature. In practice, the 
Gaussian filters can remove the effect of random noise. As a result, the Gaus- 
sian filters are used to smooth the curvature. 

The 1 D Gaussian smoothing filter is given by 


| s 


pe 207 , (9.18) 





The smoothed curve is 


ee =x(t)*h(t,o), (9.19) 


Y(t,o)= y(t) *h(t,o), 


where o is the standard deviation and x is the convolution operator. 


. Seek the local extreme of the curvature which corresponds to the corners of real 


contours. 


. Merge the corners whose distance is smaller than o, and keep the corners Nj 


with bigger curvature. 


. Accumulate the curvature between two neighboring corners, thus yielding the 


curve of accumulated curvature. 


. Calculate the intersection point of the curve of accumulated curvature between 


the two neighboring corners which correspond to the tangent points N2 of the 
contour of the trace. 


. Calculate the final segmenting point set N = Nj U No. 
. Make a decision on contour types, either lines or arcs, from the curvature and 


accumulated curvature of contours. 


. Calculate the radius r between two segmenting points, and fit the shapes of each 


road segment. 


To validate the proposed approach, we collected the road data using DGPS by 


the Springrobot platform, as shown in Fig. 9.8. First, we filter the road curve using 


(9.19) (shown in Fig. 9.8(b)), while the blue solid line is the result of the filtered 


curve, and ‘o’ denotes the original points. For better illustration, we take one point 
within each contour segment with 0.5 m. Furthermore, we obtain corner points 


(red ‘o° in Fig. 9.8(c)) of a road contour. Finally, we generate a fitted contour in 


Fig. 9.8(d). To have a closer look at the performance of curve fitting, we list the 


fitting errors of some sampling points in Table 9.2. 


9.4 The Relationship Between Motor Pulses and the Front 


Wheel Lean Angle 


To obtain the relationship between motor pulses and the angle of the frontal wheel, 
we give pulse commands to servomotor, and then measure the angle between the 
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| — after filtering 
o 0.5m each point 


distance (m) 


| —- 0.5meach point | 
©  coners of the curves |: 


distance (m) 





“60 0 50 100 150 200 250 300  -50 0 50 100 | 150 200 250 300 
t (s) t (s) 
(c) (d) 


Fig. 9.8 Roadcurve fitting 


Fig.9.9 The relationship 
between motor pulses and the 
frontal wheel lean angle 


angle of frontal wheel 





frontal wheel and the central line of the vehicle. Similarly, we can get the relation- 
ship between motor pulses and the angle of steering wheel. The data is shown in 
Table 9.3. 
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Table 9.2 Error of the road curve fitting 


Point index Error (m) Point index Error (m) Point index Error (m) 
588 0 621 —0.022514 654 —0.043589 
589 —0.012631 622 —0.039874 655 —0.053452 
590 —0.020484 623 —0.052511 656 — 0.056682 
59] —0.024201 624 —0.060143 657 —0.054189 
592 —0.024525 625 —0.062779 658 — 0.046865 
593 — 0.022203 626 — 0.06039 659 —0.035606 
594 —0.017965 6277 — 0.052993 660 —0.02133 
595 —0.012483 628 —0.040492 661 —0.0049879 
596 —0.0064056 629 —0.022922 662 0.012451 
597 —0.00016625 630 0 663 0.029959 
598 0.0058921 631 0.0051631 664 0.046507 
599 0.011541 632 0.0069396 665 0.061063 
600 0.016601 633 0.0064731 666 0.072614 
601 0.021038 634 0.0046559 667 0.080203 
602 0.024897 635 0.0022215 668 0.082944 
603 0.028119 636 —0.00025257 669 0.080041 
604 0.03077 637 —0.0023255 670 0.070818 
605 0.032688 638 —0.0036638 671 0.054699 
606 0.033284 639 —0.0040793 672 0 

607 0.031133 640 —0.0035323 673 0 

608 0.023164 641 —0.0022453 674 0.0020647 
609 —2.8422e—014 642 —0.00074659 675 0.0032209 
610 —0.1623 643 0 676 0.0035146 
611 —0.38312 644 —0.0077284 677 0.0030564 
612 0.56721 645 —0.0097033 678 0.002007 
613 0.3719 646 —0.0073691 679 0.00056412 
614 0.26257 647 —0.0023915 680 —0.0010431 
615 0.19258 648 0.0034642 681 —0.0025696 
616 0.13888 649 0.0084324 682 —0.0037593 
617 0.095585 650 0.010715 683 —0.0043517 
618 0.058418 651 0.008503 684 —0.0040855 
619 0.026803 652 0 685 —0.0027133 


620 —0.00028067 653 —0.026126 686 —2.8422e—014 


From Fig. 9.9, we can obtain the regression function 
E —5.18 x 10? P, (9.20) 


where & is the angle of frontal wheel, and P is the number of motor pulses. 
We would like to point out that a fuzzy controller is used as the lateral controller. 
For the details, we refer to [16]. 
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Table 9.3 The relationship between the angle of steering wheel, motor pulses, and the angle of 
frontal wheel 


Sequence Name 





Angle of steering wheel Motor pulses Angle of frontal wheel 


1 0 0 0 

2 20 22225 0.474122 
3 40 44444 1.173539 
4 60 66667 2.276610 
5 80 88889 3.622666 
6 100 111111 4.913494 
7 120 133333 6.100710 
8 140 155556 7.361193 
9 160 177778 8.577056 
10 180 200000 9.769596 
11 200 222222 11.01529 
12 220 244444 12.24955 
13 240 266667 13.47108 
14 260 288889 14.70033 
15 280 311111 15.91294 
16 300 333333 17.0687 

17 320 355556 18.22439 
18 340 377778 19.37022 
19 360 400000 20.50419 
20 380 422222 21.63254 
21 400 444444 22.74814 
22 420 466667 23.85487 
23 440 488889 24.94044 
24 460 511111 26.01698 
25 480 533333 27.05728 
26 500 555556 28.1043 

27 520 577778 29.15082 
28 540 600000 29.93288 
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Chapter 10 
Longitudinal Motion Control for Intelligent 
Vehicles 


10.1 Introduction 


From a system viewpoint, driving control tasks are to give pulse commands to throt- 
tles, brakes and steer wheels to vehicle body, thus implementing vehicle state change 
by vehicle dynamics. As we know, lateral road departures and longitudinal collisions 
are the main sources of traffic accidents. Toward this end, the goal of longitudinal 
control is to control a vehicle according to its relative position with respect to either 
the lead vehicle or obstacles. Many approaches have been proposed to follow the 
lead vehicle since the 1960s [3, 5, 8, 9]. 

The main longitudinal control approaches include PID approaches, mixed inte- 
ger linear programming [3], backing control [10, 11], fuzzy control [12, 13], and 
neural control [6, 7]. As the seminal work, the ALVINN used a single hidden layer 
back-propagation network to control NAVLAB by directly inputting a 30 x 30 unit 
2D image after training, thus keeping the vehicle on the road [6, 7]. In the early 
stages, people selected either lateral control or longitudinal control for different ap- 
plications, not attempting to integrate the lateral and longitudinal control. Actually, 
this is the basic assumption of the PATH control system [8]. In many applications 
of intelligent vehicles, lateral control and longitudinal control are closely related. 
Hence, Li et al. investigated tire/road friction modeling for integrated lateral con- 
trol and longitudinal control [4]. Moreover, many controller strategies incorporated 
a modeling strategy of human driving behavior. As an example, Kim et al. pro- 
posed using Piecewise Polynomial (PWP) model to represent the mapping from the 
driver's sensing information to the driving operations [3]. 

Figure 10.1 illustrates the framework of the control system. This framework con- 
sists of a controller group, an executive module, feedback sensors, and a vehicle 
body. 

The rest of this chapter is organized as follows. Section 10.2 introduces the sys- 
tem identification in the vehicle longitudinal control. Section 10.3 presents the pro- 
posed controller. Section 10.4 validates the proposed controller. 
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Fig. 10.1 The framework of a control system 


10.2 System Identification in Vehicle Longitudinal Control 


To control a vehicle better, it is necessary for estimating the dynamical model of the 
vehicle. The commonly-used dynamical models are described as 





Ko 

First order systems: H(s) = , 10.1 
y (s) Tos 1 (10.1) 
| Ko ts 
First-order lag systems: H(s) = ————e ^", (10.2) 
Tos + 1 
Ko 

Second order systems: H(s) = ———————————-, (10.3) 

(Tis +1)(Ibs + 1) 

Ko -ts 

Second-order lag systems: H (s) = , (10.4) 


(sis) 


where Ko, To/ T1/ T», and t are the magnifying coefficient, time constants and the 
time lag, respectively. Usually, we can identify those parameters by the curve of step 
response. In practice, either first-order systems in (10.1) or first-order lag systems 
can approximate the real system very well. We briefly discuss how to select a system 
model and its parameters below. 


10.2.1 The First-Order Systems 


Given a step signal xo, we can get its stable output y(oo) from Fig. 10.2(a). After- 
wards, we can calculate both Ko and To according to the following steps. 


e Calculate Ko by 


|. y(oo) 
— 


Ko (10.5) 


5ris.cn 000000 





10.2 System Identification in Vehicle Longitudinal Control 141 


x(t) 


Xo 








(a) (b) 


Fig. 10.2 The step response curve: (a) the input function x(t); (b) the response function y(t) of 
x(t); (c) the normalized response function y* (t) of x(t) 


e To calculate To, first normalize the response of a step signal shown in Fig. 10.2(b) 





by 
t 
JosL. (10.6) 
y(oo) 

and get the solution of y* (t) (shown in Fig. 10.2(c)) as 

y*()) 21—e "fo, (10.7) 
Therefore, we get 
—t 

To (10.8) 


bhaye) 


Now, we select two points from the normalized curve, namely y*(t1) = 0.333 and 
y“ (t2) = 0.632, and calculate its corresponding time constants 


= “hh = 
l — n AC -23f, (10.9) 
D = may T 
If Ti © To, we obtain To = AA However, when (Ti — 75) > e (a constant), we 





will adopt either second-order systems or first-order lag systems. 
We can evaluate how the first-order system function approximates the normalized 
y* (t) at t3 = 79/2 and t4 = 2To. From (10.7), we have 


To 
*(fa) -1—e 7% =0.39, t= To/2; 
y" (t3) d 3 — To/ (10.10) 


y*(t) 2-1—e % —0.87, t4=2T). 


If the values of y* (t) from Fig. 10.2 at time tz and t4 are remarkably different from 
y*(t3) and y*(t4) in (10.10), it means large error. As a result, we have to select 
another system function, such as a first-order lag system function. 
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10.2.2 First-Order Lag Systems 


Similar to the first-order systems, we first normalize y(t) into y*(t) by 





t 
y= AN (10.11) 
y(oo) 
Therefore, the solution y* (t) is then 
Ü. qm 
y" (t) = Ee (10.12) 
l—e 4%, fr. 


Similar to the first-order systems, we take different values of y*(t,) and y*(t2) at 
times ft and f» in Fig. 10.2. Afterwards, we can calculate To and t by solving 
tj =T 


y(t) =l—-e ^, 
Md (10.13) 


y*()2l1-e ^, 


Assuming f2 > tı > t in (10.13), we take logarithms and obtain 








In(l — y*(t)) = — Sp, 
. IT (10.14) 
In(1 — y*£) = — gt. 
Hence, from (10.14), we have 
= — NES 
T0 = a) Indy)’ ios) 


q — 2inU-y*(n))—4 In—y*(®)) 
—  Indi—y*(41))-Ind—-y*(@2))* 


For computing convenience, we take y* (t1) = 0.390, y*(t2) = 0.632, and then we 
have 


trn (10.16) 


T —2f1 — f. 


After obtaining To and t, we will check y* (t) at times t3, t4, and ts: 


y*(t3) 20, 8 <T; 
y*(t4) 20.55, 14—0.8T9 +T; (10.17) 
y*(t5) 20.865, ts —2To 4 x. 


If the values of the normalized curve at times £3, t4 and t5 are remarkably different 
from the above values, we could further validate second-order systems. For simpli- 
fying our exposition, we do not discuss this validation further. For the identification 
of a second-order system, we refer to [1]. 
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Fig. 10.3 The brake system 
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Fig. 10.4 The response curve of a step signal 


10.2.3 Identification of Our Vehicle System 


In this section, we explain how to identify the velocity model. Our test field is a flat 
road whose length is 3 km. Figure 10.3 shows the brake system, where /; is the dead 
zone and l is the effective range. In our experiment, we input 1800 pulses before 
vehicle's acceleration, and get y(oo) = 18 after inputting 2700 pulses (shown in 
Fig. 10.4). 


1. The First-Order System Assumption The magnifying coefficient Ko is cal- 
culated by (10.5) and is 


|y) 18 


K — LLL — 
0 xo 2700-— 1800 


0.02. (10.18) 
We take two points from the normalized curve in Fig. 10.4, namely y*(t,) = 0.330 
and y* (t2) = 0.632, and then calculate the time constants 


—t 
Ti = in 0.63) = 2.511, (10.19) 


a —l2 = 
h = mü-0330) ^ 22: 


Since tj = 5.6 s and t? = 12.7 s are observed in our experiment, Tj ~ T>. As a result, 
To = #42 —13.35 s. 
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Now we obtain the first-order system 


0.02 


H(s) = = 
13.355 + 1 


(10.20) 


and validate the normalized values of step response at times t3 = 70/2 and t4 = 2 Tọ. 
From (10.7), we have 


0 
x 
(T)/2) 21— e 7 — 0.39, 
y of 2 (10.21) 


y*QT)) 21—e % =0.87. 


The corresponding values of normalized curve of step response at times t3 and t4 
are 0.405 and 0.833, respectively. As a result, the error between the actual and the 
ideal system is very small. 


2. Validating the First-Order Lag Assumption In Fig. 10.4, there exists a sud- 
den slope change, which means time lag. According to the curve of Fig. 10.4, we 
have y*(t;) = 0.390 and y*(t5) = 0.632, where t4 = 6.6 s, t? = 12.7 s; hence, we 
get To and t by (10.16) 


To = 2(t) — t1) = 12.2, (10.22) 
That is, 
0.02 
H(s) = o (10.23) 
28 


Now, we take the values of the normalized curve at times t3 = 0.87) + x and t4 = 
2To + T to validate the model 


; (10.24) 
y (t4) 20.865, 14—2T9-4 rt. 


bs —0.55, 130.87 +T: 
The values of the normalized curve at times f3 and t4 are 0.55 and 0.8535, re- 
spectively. As a result, the error between the ideal first-order lag system and the 
Springrobot system model is very small. 


3. Validating Second-Order Assumption According to the curve of Fig. 10.4, 
we have y*(t;) = 0.4 and y*(t2) = 0.8, where t, = 6.8 s and t = 21.225 s. Hence, 
we get Tj and T» [1] as 


(10.25) 





k + Ty © 442 x 12.975, 


Ti*T?o — ty y 
Taine ^ Lara 0.55 ~ 0.00746. 
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Fig. 10.5 The simulated results with different time constants Tọ 


From (10.25), we can see that one of 7; and T» is very large while the other is very 
small, and ft; /t2 = 0.32, which means that the system function of Springrobot is a 
first-order model. Therefore, we have 


| ht 5 


g 13.220. (10.26) 
2.12 


Now, we can determine that the velocity model of the Springrobot is a first-order 
lag system with Tọ € [12.2s, 13.35s], Ko = 0.02 and t = 0.5 s. 


10.3 The Proposed Velocity Controller 


10.3.1 Validating the Longitudinal Control System Function 


In Sect. 10.2, we identified the system model of Springrobot as the first-order lag 
system described as 


0.02 0.55 


H — 
() Tos + 1 


, (10.27) 
where Tọ € [12.20s, 13.35s]. The system responses for fif. — |2.20 s and T = 
13.35 s are shown in Fig. 10.5. 

We can estimate the accuracy of the system model. When D = 12.205, 


0.780 


0.9137 0:913 x 100% ~ 0.219%, t=230s. 


1 O.787-0-7189 x 100% © 0.897%, t=20s; 
(10.28) 
1 = 
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Fig. 10.6 The structure of the velocity control system 


When Tj = 13.35 s, 


1 0.780—0.765 
— 0.780-0.765 x 100% ~ 1.932%, t=20s: 
| 0.780 (10.29) 
5 — 


O9 SSO x 100% + 3.169%, 1 —30s. 


Upon the previous analysis, we can see that the system with Tọ = 12.20 s is better 
than that with To = 13.35 s. Hence, the system function of the longitudinal system 
is described as 


0.02 05s 


H (s) = —————e 
12.205 + 1 


(10.30) 


10.3.2 Velocity Controller Design 


The proposed velocity controller adopts a cascade control scheme combining throt- 
tle control and brake control (shown in Fig. 10.6), where the inner loop is a fuzzy 
adaptive robust control module and its outer loop is an improved Single-Neuron 
adaptive PID (SN-PID) control module based on the quadratic performance index. 
This velocity controller is capable of enduring environment change though its struc- 
ture is simple. As a result, it is robust with respect to complex environment. We 
introduce the SN-PID below since it plays an important role in our system. Jetal et 
al. proposed the SN-PID thanks to self-learning and self-adaptation properties of a 
single neuron [2]. The SN-PID not only has a simple structure, but also adapts to 
environment change. Its structure is shown in Fig. 10.7, where y,(k) and y(k) are its 
input and output variables; xı (k), x2(k), and x3(k) are the outputs of the converter, 


x1(k) = yr (k) — y(k) = e(k), 
x2(k) = e(k) — e(k — 1) = Ae(k), (10.31) 
x3(k) = e(k) — 2e(k — 1) + e(k — 2) = Ae(k) — Ae(k — 1), 
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Fig. 10.7 The control structure of the SN-PID 


K > 0 is a proportionality factor of the neuron. Hence, the SN-PID control algo- 
rithm is described as 


3 
Au(k) =u(k) —u(k — 1) — K Noi (k)xi(k), (10.32) 
i=l 


where cj (k) is the weight of x; (k). We would like to point out that both K and æ; (k) 
can be adjusted by self-learning. There are many different parameter learning rules 
each corresponding to different control algorithms. Here, we adopt the quadratic 
performance index to learn those parameters, namely 


1 
E= 5 P[y k +d) — yk + d) + QVu? (k), (10.33) 
where d is the delay time; P is the weight of output error; Q is the weight of control 
increments. 
Assume the system equation is formulated as 


Na np 
y(k-d)2 —5 aiy(k d — i) 5 biu(k—i). (10.34) 
iel i—0 


The weight update is in the direction of the negative gradient of E in (10.33). 


OF 


Vai lk) = wilk + 1) — æi (k) = Ne) 


(10.35) 


3 


= ni | Poet + d)xj(k) - OK X c (k)x; Jd vl (10.36) 


i=] 


where bo is the response of a unit step function at the initial zero-state, and can be 
obtained by experiments. 
Therefore, we obtain the following equations 


3 
u(k) 2 u(k — 1) - K $ ai (K)xi (), 
iex 


5rjs.cn 000000 





148 10 Longitudinal Motion Control for Intelligent Vehicles 





0 50 100 150 200 250 05 50 100 150 200 250 
time(s) time(s) 





time(s) 


(c) 





Fig. 10.8 The simulation of speed: (a) unsupervised Hebb learning rule; (b) supervised Delta 
learning rule; (c) supervised Hebb learning rule; (d) improved Hebb learning rule 


- cj (k) 
ei (Kk) = ———— ——, 
X; lo; (k) 
3 
wi (k + 1) = æi (k) + ni K 1 Pboe(k + d)xi(k) — QK ) oj (x; (X) xj (K) 
= 


In practice, we use e(k) instead of e(k + d) due to e(k + d) being unavailable. 


10.4 Experimental Results and Analysis 


We validate the longitudinal system model using four learning rules: unsupervised 
Hebb learning rule, supervised Delta learning rule, supervised Hebb learning rule, 
and improved Hebb learning rule. In our experiments, n p = 2, n; = 0.4, np = 0.5; 
K is taken as 0.005, 0.075, 0.0045, and 0.085 in the four learning rules. The ex- 
perimental results are shown in Figs. 10.8 and 10.9. From the experimental results, 
we can see that the value of K affects the performance of the controller, for exam- 
ple, its response time and overshoot. Furthermore, we validate the SN-PID based on 
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Fig. 10.9 Weights vary with different learning rules: (a) unsupervised Hebb learning rule; (b) su- 
pervised Delta learning rule; (c) supervised Hebb learning rule; (d) improved Hebb learning rule 
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Fig. 10.10 The SN-PID based on the quadratic performance index: (a) speed tracking; (b) differ- 
ent weights 
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the quadratic performance index, shown in Fig. 10.10. In this experiment, P — 2, 
Q = 1, bg = 6, K = 0.02, np = 80, n; = 0.4, and np = 259. Compared to other 
learning rules, this learning rule has a lower computing burden and clearer physical 
meaning. 
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